## What are covariance and contravariance?

July 21, 2017

Subtyping is a tricky topic in programming language theory. The trickiness comes from a pair of frequently misunderstood phenomena called

*covariance*and*contravariance*. This article will explain what these terms mean.The following notation will be used:

`A <: B`

means`A`

is a subtype of`B`

.`A -> B`

is the type of functions from`A`

to`B`

.`e : T`

means the expression`e`

has type`T`

.

### A motivating question

Suppose I have these three types:

Greyhound <: Dog <: Animal

So

`Greyhound`

is a subtype of `Dog`

, and `Dog`

is a subtype of `Animal`

. Subtyping is usually transitive, so we’ll say `Greyhound`

is also a subtype of `Animal`

.**Question:**Which of the following types could be subtypes of

`Dog -> Dog`

?`Greyhound -> Greyhound`

`Greyhound -> Animal`

`Animal -> Animal`

`Animal -> Greyhound`

How do we answer this question? Let

`f`

be a function which takes a `Dog -> Dog`

function as its argument. We don’t care about the return type. For concreteness, we can say `f : (Dog -> Dog) -> String`

.Now I want to call

`f`

with some function `g`

. Let’s see what happens when `g`

has each of the four types above.**1. Suppose**

`g : Greyhound -> Greyhound`

. Is `f(g)`

type safe?No, because

`f`

might try to call its argument (`g`

) with a different subtype of `Dog`

, like a `GermanShepherd`

.**2. Suppose**

`g : Greyhound -> Animal`

. Is `f(g)`

type safe?No, for the same reason as (1).

**3. Suppose**

`g : Animal -> Animal`

. Is `f(g)`

type safe?No, because

`f`

might call its argument (`g`

) and then try to make the return value bark. Not every `Animal`

can bark.**4. Suppose**

`g : Animal -> Greyhound`

. Is `f(g)`

type safe?Yes—this one is safe.

`f`

might call its argument (`g`

) with any kind of `Dog`

, and all `Dog`

s are `Animal`

s. Likewise, it may assume the result is a `Dog`

, and all `Greyhound`

s are `Dog`

s.### What’s going on?

So this is safe:

(Animal -> Greyhound) <: (Dog -> Dog)

The return types are straightforward:

`Greyhound`

is a subtype of `Dog`

. But the argument types are flipped around: `Animal`

is a *supertype*of`Dog`

!To state this strange behavior in the proper jargon, we allow function types to be

*covariant*in their return type and*contravariant*in their argument type. Covariance in the return type means`A <: B`

implies `(T -> A) <: (T -> B)`

(`A`

stays on the left of the `<:`

, and `B`

stays on the right). Contravariance in the argument type means `A <: B`

implies `(B -> T) <: (A -> T)`

(`A`

and `B`

flipped sides).**Fun fact:**In TypeScript prior to version 2.4, argument types were

*bivariant*(both covariant and contravariant), which is unsound. Eiffel also got this wrong, making argument types covariant instead of contravariant.

### What about other types?

**Question:**Could

`List<Dog>`

be a subtype of `List<Animal>`

?The answer is a little nuanced. If lists are immutable, then it’s safe to say yes. But if lists are mutable, then definitely not!

Why? Suppose I need a

`List<Animal>`

and you pass me a `List<Dog>`

. Since I think I have a `List<Animal>`

, I might try to insert a `Cat`

into it. Now your `List<Dog>`

has a `Cat`

in it! The type system should not allow this.Formally: we can allow the type of immutable lists to be covariant in its type parameter, but the type of mutable lists must be

*invariant*(neither covariant nor contravariant) in its type parameter.**Fun fact:**In Java, arrays are both mutable and covariant. This is, of course, unsound.

## Type safe dimensional analysis in Haskell

July 16, 2017

Years ago my colleague Gustavo asked how I would represent physical units like

`m/s`

or `kg*m/s^2`

as types so the compiler can check that they match up and cancel correctly. F# supports this natively, but it felt weird to have it baked into the type system. It seemed too ad hoc, though I didn’t know of anything better.Today I was thinking about this again, and I found a way to do it in Haskell. The main idea is to represent units of measure as function spaces. For example, the unit

`m/s`

can be encoded as the type `Seconds -> Meters`

. The numerator becomes the return type, and the denominator the argument type. Function types can be composed to form more interesting units, such as `Seconds -> Seconds -> Meters`

for acceleration. Products can be represented by higher-order functions. A special type class will enable us to easily convert a `Double`

into a quantity with any units of measure and vice versa.There are a handful of packages on Hackage for doing dimensional analysis. This article will demonstrate a simple, portable way to do it without relying on any language extensions. As we will see in

*Example 3*, this implementation sometimes requires the programmer to provide proofs of unit equivalence (which can be tedious at times). This is less convenient than other libraries, but it’s an interesting exhibition of the power of vanilla Haskell.### Units as function spaces

First, we define some types for base units. We will never instantiate these types, so we don’t specify any constructors. They are only used for type checking.

data Meter data Kilogram data Second

Next, we need types which actually represent quantities with units. For the base units above, we define

`BaseQuantity`

as follows:newtype BaseQuantity a = BaseQuantity Double

It’s just a wrapper for

`Double`

, but written as a phantom type. The type parameter `a`

keeps track of the base unit. For example, `BaseQuantity Meter`

is a type which represents a length.A quotient

`a / b`

of two units `a`

and `b`

will be represented by the function space `b -> a`

. For example, `m / s`

becomes `BaseQuantity Second -> BaseQuantity Meter`

. To make intentions clear, we formalize this idea as a type synonym:type Quotient a b = b -> a

We also need a

`BaseQuantity`

for dimensionless quantities like π. We could define a new base type for this, but `()`

does the job nicely:type Dimensionless = BaseQuantity ()

We can also define multiplicative inverse

`a^-1`

as the quotient `1 / a`

.type Inverse a = Quotient Dimensionless a

A product

`a * b`

can be represented as `a / b^-1`

:type Product a b = Quotient a (Inverse b)

A helpful synonym to make square units like

`m^2`

easier to read:type Square a = Product a a

All quantities have some numeric value. We formalize this in Haskell using a type class:

class Quantity a where construct :: Double -> a destruct :: a -> Double

The instance for base quantities is trivial, since

`BaseQuantity`

just wraps a `Double`

:instance Quantity (BaseQuantity a) where construct = BaseQuantity destruct (BaseQuantity x) = x

Quotients of quantities are quantities as well. To construct a

`Quotient`

from a `Double`

, we define a function which destructs its argument, multiplies the result by the given `Double`

, and constructs a quantity of the return type (the numerator unit). To destruct a `Quotient`

, we first construct `1`

in the denominator unit (the argument type), use the quotient to convert it into the numerator unit, and destruct the result.instance (Quantity q, Quantity r) => Quantity (q -> r) where construct x = \y -> construct (x * (destruct y)) destruct x = destruct (x (construct 1))

We need to define an axiom that allows us to rearrange quotients:

-- a / (b / c) = c / (b / a) quotientAxiom :: (Quantity a, Quantity b, Quantity c) => Quotient a (Quotient b c) -> Quotient c (Quotient b a) quotientAxiom = construct . destruct

*Exercise for the reader:*Do we need this axiom, or is there a way to derive it without destructing any quantities? Alternatively, are there any other axioms we need?

And finally, we define the familiar arithmetic operations:

-- We can add two quantities of the same unit. infixl 6 .+. (.+.) :: Quantity a => a -> a -> a (.+.) x y = construct $ (destruct x) + (destruct y) -- We can subtract two quantities of the same unit. infixl 6 .-. (.-.) :: Quantity a => a -> a -> a (.-.) x y = construct $ (destruct x) - (destruct y) -- We can multiply any two quantities. infixl 7 .*. (.*.) :: (Quantity a, Quantity b) => a -> b -> Product a b (.*.) x y = \z -> construct $ destruct (z y) * destruct x -- We can divide any two quantities. infixl 7 ./. (./.) :: (Quantity a, Quantity b) => a -> b -> Quotient a b (./.) x y = \z -> construct $ (destruct z) * (destruct x) / (destruct y)

Other primitive operations, such as comparison, can be defined similarly.

### Examples

The examples below will use these helpful type synonyms:

type Length = BaseQuantity Meter type Mass = BaseQuantity Kilogram type Time = BaseQuantity Second type Area = Square Length type Velocity = Quotient Length Time

#### Example 1: Tracking units in types

Let’s calculate the area of a table.

tableWidth :: Length tableWidth = construct 1.5 tableHeight :: Length tableHeight = construct 2.5 tableArea :: Area tableArea = tableWidth .*. tableHeight

Calculations with quantities are type safe. Suppose we defined the mass of the table:

tableMass :: Mass tableMass = construct 150

Then the following is a type error:

tableArea :: Area tableArea = tableWidth .*. tableMass -- Error: Couldn't match type ‘Kilogram’ with ‘Meter’ -- Expected type: Area -- Actual type: Product Length Mass

#### Example 2: Quantities as functions

Suppose we want to calculate how far a train will travel, given its velocity and the duration of the trip.

trainVelocity :: Velocity trainVelocity = construct 30 tripDuration :: Time tripDuration = construct 5000

Here we demonstrate the correspondence between quantities and functions.

`Velocity`

is a synonym for `Length / Time`

, but it’s also a function from `Time`

to `Length`

. Given a `Time`

, we can simply “apply” a `Velocity`

to it to get a `Length`

:tripDistance :: Length tripDistance = trainVelocity tripDuration

So multiplication of

`a / b`

by `b`

is just function application.#### Example 3: Manual proofs of unit equality

Let’s define a function that takes a

`Length`

and a `Velocity`

and returns a `Time`

. First try:calculateDuration :: Length -> Velocity -> Time calculateDuration distance velocity = distance ./. velocity -- Error: Couldn't match type ‘Velocity -> Length’ with ‘BaseQuantity Second’ -- Expected type: Time -- Actual type: Quotient Length Velocity

Haskell doesn’t know that

`Length / Velocity = Time`

. If this is indeed true, there will be a way to manipulate the program (without destructing the quantity) to make it type check by swapping arguments, defining new functions, using the `quotientAxiom`

, etc. This is similar to what one would do in a proof assistant like Agda or Coq.So we have:

distance ./. velocity :: Length / Velocity

`Velocity`

is a type synonym for `Length / Time`

, so we actually have:distance ./. velocity :: Length / (Length / Time)

We can apply the

`quotientAxiom`

to get:quotientAxiom (distance ./. velocity) :: Time / (Length / Length)

Under the interpretation of units as function spaces, we have:

quotientAxiom (distance ./. velocity) :: (Length -> Length) -> Time

We can apply

`id`

to cancel the `Length`

s and get a `Time`

. Putting it all together:calculateDuration :: Length -> Velocity -> Time calculateDuration distance velocity = quotientAxiom (distance ./. velocity) id

Now we can calculate how long the trip from

*Example 2*would take if the train traveled at 40 m/s instead of 30 m/s.fasterVelocity :: Velocity fasterVelocity = construct 40 shorterDuration :: Time shorterDuration = calculateDuration tripDistance fasterVelocity

#### Example 4: Destructing the results

When we are done calculating, we can convert the results back to

`Double`

s with the `destruct`

function. Let’s print the results of the first three examples:main = do putStrLn $ "tableArea: " ++ (show $ destruct $ tableArea) ++ " m^2" putStrLn $ "tripDistance: " ++ (show $ destruct $ tripDistance) ++ " m" putStrLn $ "shorterDuration: " ++ (show $ destruct $ shorterDuration) ++ " s"

The output is:

tableArea: 3.75 m^2 tripDistance: 150000.0 m shorterDuration: 3750.0 s

### Conclusion

We can do type safe dimensional analysis by encoding units as function spaces. The basic pattern is:

- Construct whatever quantities you want using the
`construct`

function. A type annotation can be used to specify the units, or you can let type inference figure out the units automatically. - Do type safe operations on these quantities using the operations
`.+.`

,`.*.`

, etc. You may need to manually rearrange units or cancel them to satisfy the type checker, but this will always be possible if your units are actually correct (exercise for the reader: prove it!). - When you are done with calculation and want to use the resulting quantities, the
`destruct`

function will convert them into`Double`

s.

It’s inconvenient that we sometimes have to provide manual proofs of unit equivalence. It would be nice if rearranging and canceling units was completely automatic. But at least we have type safety!

## Decomposing a string into its elements

December 29, 2016

My colleague Esther proposed the following challenge: given a string, decompose it into elemental symbols from the periodic table (if possible). For example,

`Hi Esther`

becomes `H I Es Th Er`

. In general there might be no solutions, one solution, or several.I implemented it in Haskell with dynamic programming. The

`elementize`

function does all the work, using the list monad to compute all possible solutions.import Data.Char (isLetter, toLower) import Data.Function.Memoize (memoFix) import Data.List (intercalate, isPrefixOf) import System.Environment (getArgs) elements = [ "H" , "He" , "Li" , "Be" , "B" , "C" , "N" , "O" , "F" , "Ne" , "Na" , "Mg" , "Al" , "Si" , "P" , "S" , "Cl" , "Ar" , "K" , "Ca" , "Sc" , "Ti" , "V" , "Cr" , "Mn" , "Fe" , "Co" , "Ni" , "Cu" , "Zn" , "Ga" , "Ge" , "As" , "Se" , "Br" , "Kr" , "Rb" , "Sr" , "Y" , "Zr" , "Nb" , "Mo" , "Tc" , "Ru" , "Rh" , "Pd" , "Ag" , "Cd" , "In" , "Sn" , "Sb" , "Te" , "I" , "Xe" , "Cs" , "Ba" , "La" , "Ce" , "Pr" , "Nd" , "Pm" , "Sm" , "Eu" , "Gd" , "Tb" , "Dy" , "Ho" , "Er" , "Tm" , "Yb" , "Lu" , "Hf" , "Ta" , "W" , "Re" , "Os" , "Ir" , "Pt" , "Au" , "Hg" , "Tl" , "Pb" , "Bi" , "Po" , "At" , "Rn" , "Fr" , "Ra" , "Ac" , "Th" , "Pa" , "U" , "Np" , "Pu" , "Am" , "Cm" , "Bk" , "Cf" , "Es" , "Fm" , "Md" , "No" , "Lr" , "Rf" , "Db" , "Sg" , "Bh" , "Hs" , "Mt" , "Ds" , "Rg" , "Cn" , "Uut", "Uuq", "Uup", "Uuh", "Uus" , "Uuo" ] elementize = memoFix (\fn str -> if null str then return [] else do element <- filterPrefixes elements str soln <- fn $ drop (length element) str return $ element : soln) where toLowerStr = map toLower filterPrefixes prefixes str = filter (\x -> isPrefixOf (toLowerStr x) (toLowerStr str)) prefixes main = do args <- getArgs putStrLn $ intercalate"\n" $ map (intercalate " ") (elementize $ filter isLetter $ concat args)

To compile the program, you need the memoize package. Once compiled, it can be run like this:

$ ./elementizer hi esther H I Es Th Er

**Fun fact:**the longest word (in the macOS words file) that can be decomposed into elements is

*nonrepresentationalism*, and it has 4 decompositions:

`No N Re P Re Se N Ta Ti O Na Li Sm`

`N O N Re P Re Se N Ta Ti O Na Li Sm`

`No N Re P Re Se N Ta Ti O N Al I Sm`

`N O N Re P Re Se N Ta Ti O N Al I Sm`

## The Di Vergent Code

December 24, 2016

### Prologue

Renowned NSA cryptographer Jacques Saunière hurried toward the east exit of the OPS2A building at Fort Meade. As he lunged for the door, the corridor went dark like it sensed his attempted escape. His face was dimly lit by the blue glow of his iPhone, with Mrs. Saunière’s number already dialed. The building was on lockdown. Saunière was trapped inside.

He heard the click of a pistol’s hammer locking into place, followed by a voice. “Put away the phone. Who knows about the oracle?”

“I told you already,” the mathematician stammered as he turned to face his killer. “I don’t know what you’re talking about!”

“You’ve made a big mistake, Saunière. The exploit will be released tonight. The world will see what happens when the NSA puts backdoors in their algorithms. Your citizens will never trust their government again.”

The cryptographer reached for his pocket and pulled out a knife. There wasn’t much time.

### Chapter 1

Robert Langdon awoke to a clamorous knock on the front door of his Airbnb. “Professor Langdon?” the visitor squalled from outside. “I need to talk to you. Can I come in?”

Langdon groaned.

*Does this happen with every Airbnb?*He glanced down at a crumpled flyer on the bedside table, reminding him how his life ended up here.CARNEGIE MELLON UNIVERSITY

proudly presents

an evening with Robert Langdon

Professor of Symbolic Systems, Stanford University

“Professor?” the voice continued. The drowsy computer scientist surrendered. The visitor made herself at home.

“My name is Sophie Neveu. I work for the NSA.”

Langdon greeted her. NSA agents aren’t his typical cup of tea, but this one seemed benign.

“Your life is in danger. What do you know about this symbol?” Neveu produced a grotesque photograph taken at NSA headquarters, depicting a lifeless Saunière with a strange-looking letter carved into his chest.

“Jesus! Who did this to him?”

“No, Professor, you misunderstand. Yes, Saunière was murdered, but he carved this symbol into his own body. We found a message draft on his phone:

*Find Robert Langdon.*”“That symbol is the Greek letter lambda. An icon of the ancient Church of Alonzo,” Langdon explained.

“The Church of Alonzo?”

“The Church, if it even still exists, preserves an ancient language rumored to give the clergy a miraculous power: the ability to construct new worlds. Entire universes, not bound by the laws of physics as we know them today. That language was called the

*lambda calculus*”.“Yes—the lambda calculus! That’s what I need to talk to you about. We have reason to believe there is a group of black hat insurgents with a computational oracle, some kind of impossibly powerful computer. They’re planning to use it to break the cryptographic algorithms that secure the Internet.” Neveu’s anxiety was contagious.

“What do we know about this oracle?”

“It’s fast—faster than any computer we’ve seen. But it only understands one language—”

“—the lambda calculus?” The computer scientist interjected.

“That’s right, and we need your help to break it.”

### Chapter 2

Neveu reached for her satchel and pulled out a laptop. She opened a terminal and was greeted with the following prompt:

Welcome to the Lambda Oracle, version 1.0.0. Copyright © 1936. The Church of Alonzo. Enter terms at the prompt. $

Langdon glanced at the screen. “What is this?”

“It’s the oracle. My laptop has an SSH connection to it,” Neveu explained. “The insurgents did what they could to lock it down, but the NSA has tools for breaking into systems like this. Now, I need you to teach me that ancient language.”

Langdon recoiled. “You’re telling me you have access to a magical supercomputer with nearly infinite resources? I need to call my colleagues at Stanford—”

“No! You must not tell anyone!” Neveu bellowed.

“But this changes everything! Just imagine: we could build a massive neural network—a brain! Or we could run protein folding algorithms. We can cure diseases! This oracle can save the world!”

“—or destroy it.” Neveu didn’t share Langdon’s enthusiasm. “This computer can factor integers. You know what that means.”

“It can break cryptography,” Langdon realized.

“Exactly. Secure communications, bitcoin, vote counting, the New York Stock Exchange, all vulnerable. If we don’t stop it, this thing will catalyze the biggest financial collapse in human history. Civilization as we know it will be destroyed.” Neveu paused, as if she too just realized what this meant.

Langdon was convinced. “So how fast is this oracle?”

“Our analysts estimate it can do 10^100 beta reductions per second. A

*googol*. More than the number of atoms in the observable universe. Per second.”Langdon stood in awe.

### Chapter 3

“Teach me the ancient language,” Neveu demanded. “How does the lambda calculus work?”

“It’s remarkably simple,” Langdon explained. “It’s just functions. You know the identity function

`x ↦ x`

?”“Sure.”

Langdon entered

`λx. x`

into the terminal. “This is how you write it in the lambda calculus.” He pressed the enter key. The oracle echoed his input.Welcome to the Lambda Oracle, version 1.0.0. Copyright © 1936. The Church of Alonzo. Enter terms at the prompt. $ λx. x Result: λx. x

Neveu looked puzzled. “What are the inputs and outputs of functions in the lambda calculus?”

“The inputs and outputs are also functions. It’s functions all the way down,” Langdon clarified.

“How do you do computation then?”

“You can apply a function to an argument,” Langdon continued, grabbing a pen and notebook. “In classical mathematics, you would write

`f(x)`

. In the lambda calculus, we just write `f x`

.”“How do you compute

`f x`

?”“You evaluate expressions with

*beta reduction*. You substitute the argument for the function’s formal parameter.*Plug and chug*, as they say. You keep doing that until no more reductions are possible. Consider the identity function. What happens if you apply it to itself? You just get the identity function back.” Langdon demonstrated his claim at the prompt.$ (λx. x) λx. x Result: λx. x

Neveu was astounded by the elegance and simplicity of this language. “That’s it?”

“That’s it,” Langdon confirmed. “Apparently the oracle can do that 10^100 times per second.”

“What about functions with multiple parameters?” Neveu’s curiosity was insatiable.

“In a higher-order system like the lambda calculus, you don’t need functions with multiple parameters. You can simulate them. Here is a function of one argument which returns a function of another argument, and that function applies the first argument to the second.”

$ λf. λx. f x Result: λf. λx. f x

“It’s like a single function that takes two arguments

`f`

and `x`

,” Langdon continued. “In fact, with a sacred ritual called Church encoding, you can express all kinds of things in this simple language: numbers, booleans, trees, etc. It’s rumored that the Church of Alonzo even found a way to do recursion.”“What’s so hard about recursion?” Neveu wondered. “Can’t you just write

`f = λx. f x`

?”“That’s the tricky part. The lambda calculus is such a simple language, it doesn’t even have

`=`

. But, according to legend, the Church has a special expression that implemented general recursion. They called it the Y combinator. No one knows how it was constructed.”In that moment, Neveu realized their next move. “We have to find out how they did it. We’re going to hit the oracle with a denial of service attack and stop the insurgents from destroying the world.”

### Chapter 4

Langdon offered Neveu a bottle of Soylent, interrupting her daydream of re-discovering the legendary Y combinator. The algae in the drink triggered a complicated and poorly-understood metabolic pathway from her stomach to her brain, leading to a moment of brilliance.

“Professor, what if there was an expression which, when reduced, results in the same expression?”

“So you could reduce it again and again, ad infinitum?” Langdon was catching on.

“Exactly. Look at this function.” Neveu scribbled

`λx. x x`

into Langdon’s notebook. “It takes `x`

and produces `x x`

. In a sense, it duplicates its input. What if we used it to duplicate itself?”“A mathematical copy machine!”

“Exactly.” Neveu reached for the keyboard and entered

`(λx. x x) λx. x x`

. Something strange happened.$ (λx. x x) λx. x x (λx. x x) λx. x x (λx. x x) λx. x x (λx. x x) λx. x x (λx. x x) λx. x x ...

“What is going on?” Langdon asked.

“The oracle keeps reducing this expression, but every reduction results in the original expression again. It’s stuck in an infinite loop!” Neveu exclaimed.

Langdon’s phone began to ring.

### Chapter 5

“Professor Langdon, this is NSA Director Bezu Fache. Do not react to this call. You are in grave danger. Sophie is lying to you.”

“Who is it?” Neveu asked.

“A colleague from Stanford. Something about the faculty lunch schedule. I’ll just be a moment.” Langdon returned to his call.

Fache continued, “Sophie is trying to use you to derive the Y combinator. She’s part of the insurgent group. You have to stop her!”

“How?”

“Derive it yourself before she does. Use it to destroy the oracle.” Fache ended the call.

Langdon looked up. Neveu was gone; she must have figured out what that call was really about. Time was running out.

He pondered Neveu’s mathematical copy machine.

*It diverges, but it’s useless. It doesn’t help us compute factorial, Fibonacci, or any other recursive function.*Now it was his turn to be brilliant.### Chapter 6

(λx. x x) λx. x x

Langdon stared at Neveu’s fascinating but useless discovery as he drank the Soylent.

“What if you wrap the

`x x`

in a function call? So instead of `x x`

, we have `f (x x)`

,” he spoke to himself—aloud—because it’s not weird to talk to yourself if it helps you think. “We could choose some function `f`

to be called at every level of recursion.” He picked up his notebook and wrote:λf. (λx. f (x x)) (λx. f (x x))

*What does this function do? What happens when you apply it to some*Langdon wondered.

`g`

?(λf. (λx. f (x x)) (λx. f (x x))) g = (λx. g (x x)) (λx. g (x x)) = g ((λx. g (x x)) (λx. g (x x)))

Langdon had just discovered

`Y`

, with the remarkable property that `Y g = g (Y g)`

for any `g`

. “What if `g`

decides not to use its argument?” he pondered. “Then, in a language with lazy evaluation, the recursion would terminate!”Let’s take a moment to understand what Professor Langdon just discovered. Suppose

`g`

is a curried function of two arguments (i.e., it takes the first argument and returns a function which takes the second argument). What happens when you compute `Y g`

? You get back `g`

, except the first argument has already been supplied. So you actually get a modified version of `g`

that only takes the second argument. And what was given for the first argument? The modified `g`

! So `g`

can use its first argument to call itself.Hoping the Church of Alonzo would forgive him for mixing a little Python into their sacred calculus, Langdon wrote out the factorial function:

Y λf. λn. 1 if n == 0 else n * f (n - 1) end

His phone was ringing again.

### Chapter 7

“Professor Langdon, I have good news and bad news.” It was Fache.

“Bad news first.”

“Our analysts discovered the oracle has pre-emptive multitasking. So a divergent computation isn’t enough to destroy it,” Fache explained.

“What’s the good news?”

“It doesn’t have a garbage collector. Use the Y combinator to eat up all of its memory and stop Neveu from causing a global catastrophe.”

“You don’t need the Y combinator for that. You just need some divergent term that isn’t tail recursive.

`(λx. x x x) λx. x x x`

will do,” Langdon pointed out.“Brilliant.” Fache ended the call without saying goodbye. Typical Fache.

## Formalizing dynamic scoping

November 26, 2016

The lambda calculus is a minimal functional programming language. We use it to study type systems, evaluation strategies, proof calculi, and other topics in programming languages and theoretical computer science. A recent discussion led me to wonder: can we use it to study scoping rules as well?

### Lexical vs. dynamic scoping

Most programming languages have

*lexical scoping*. This means scopes are determined statically by the program structure. Name resolution happens by looking up the identifier first in the current block, then in the parent block, then in the grandparent block, etc. With*dynamic scoping*, scopes are defined by stack frames. Names are looked up in the current function first, then in the caller, then in the caller’s caller, etc. This can only happen at runtime, because the call stack doesn’t exist statically.Consider the following JavaScript expression:

(function(x) { return (function(x) { return function() { return x; }; })('lexical')(); })('dynamic')

As you might have guessed, this expression tells you what scoping strategy the programming language uses. Specifically, it evaluates to

`'lexical'`

, because JavaScript uses lexical scoping. Here is the same expression written in Emacs Lisp:((lambda (x) (funcall ((lambda (x) (lambda () x ) ) "lexical")) ) "dynamic")

This evaluates to

`"dynamic"`

, because Emacs Lisp has dynamic scoping.### The lambda calculus

The lambda calculus, like JavaScript, uses lexical scoping—as we will show! Below is a quick introduction to this versatile little programming language, in case you haven’t seen it before. Here is a more in-depth tutorial for curious readers.

#### Syntax

The syntax is quite spartan. A term (“term” is lambda-speak for “expression”) takes one of three forms:

**Variable:**\( x \)**Abstraction:**\( \lambda x . t \) (where \( t \) extends as far right as possible)**Application:**\( t_1 \: t_2 \) (left-associative)

From these three building blocks, you can construct bigger terms like this curious one:

\[ \lambda f . \left( \lambda x . f \left( x \: x \right) \right) \left( \lambda x . f \left( x \: x \right) \right) \]

This term is known as the Y combinator, and it has the interesting property that it can be used to implement unbounded recursion. Let’s save that for another time.

#### Semantics

The operational semantics tells us how to evaluate a term:

\[ \frac{ t_1 \rightarrow t_1’ }{ t_1 \: t_2 \rightarrow t_1’ \: t_2 } \left( 1 \right) \]

\[ \frac{ t \rightarrow t’ }{ v \: t \rightarrow v \: t’ } \left( 2 \right) \]

\[ \frac{ }{ \left( \lambda x . t \right) \: v \rightarrow t \left[ v/x \right] } \left( 3 \right) \]

If you’ve never seen this notation before, here’s a quick explanation. Each rule reads like an if-then statement. The part above the line is the

*antecedent*(the “if” part). The part below the line is the*consequent*(the “then” part). If the antecedent is true, then we can conclude the consequent is true also.Rule (1) states: if some term \( t_1 \) reduces to \( t_1' \), then the application \( t_1 \: t_2 \) reduces to \( t_1' \: t_2 \). This just means when we are applying some term \( t_1 \) to another term \( t_2 \), we can try to reduce \( t_1 \) before doing the application.

Rule (2) states: if some term \( t \) reduces to \( t' \), then \( v \: t \) reduces to \( v \: t' \). The \( v \) means that the first term in the application is a

*value*, i.e., an abstraction. We have this restriction to force a particular evaluation strategy (call-by-value). Of course, other evaluation strategies are possible too.Rule (3) tells us how function application works. Note that there is no antecedent, which means we can conclude the consequent unconditionally. The consequent states that if we have some application \( \left( \lambda x . t \right) v \), then we can reduce it by substituting \( v \) for free occurrences of \( x \) in \( t \). Special care must be taken to ensure the substitution is done in a capture-avoiding manner.

#### Example

Remember that magical expression that tells us what scoping rule the language uses? Here it is written as a term in the lambda calculus (where \( \_ \) is some fresh variable and \( * \) is any term, needed only because abstractions in the lambda calculus must take an argument):

\[ \left( \lambda x . \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \right) \: \texttt{“dynamic”} \]

Convince yourself that this term matches the JavaScript and Emacs Lisp expressions above. How do we begin reducing it? First, we note that it’s an application. Both rules (1) and (3) are meant for applications. We see that the left term of the application is a value, so we apply rule (3). The term reduces to:

\[ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \]

Because \( x \) was not free in the body of the abstraction, no substitution was necessary and \( \texttt{"dynamic"} \) disappeared entirely. So already we know the lambda calculus uses lexical scoping! Let’s continue anyway.

This term might look hard to parse, but remember that application is left-associative. So the left side is \( \left( \lambda x . \lambda \_ . x \right) \: \texttt{"lexical"} \) and the right side is \( * \). Since the left-side of the application is not a value (it can be reduced), we apply rule (1):

\[ \frac{ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \rightarrow t_1’ }{ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \rightarrow t_1’ \: * } \left( 1 \right) \]

But what is \( t_1' \)? We use rule (3) to find it:

\[ \frac{ }{ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \rightarrow \lambda \_ . \texttt{“lexical”} } \left( 3 \right) \]

Okay, so \( t_1' \) is \( \lambda \_ . \texttt{"lexical"} \). We are left with:

\[ \left( \lambda \_ . \texttt{“lexical”} \right) \: * \]

Applying rule (3) one last time gives:

\[ \texttt{“lexical”} \]

So the lambda calculus, as we’ve presented here, uses lexical scoping. Here is the complete execution trace and relevant proofs:

\[ \frac{ }{ \left( \lambda x . \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \right) \: \texttt{“dynamic”} \rightarrow \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * } \left( 3 \right) \]

\[ \frac{ }{ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \rightarrow \lambda \_ . \texttt{“lexical”} } \left( 3 \right) \]
\[ \frac{ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \rightarrow \lambda \_ . \texttt{“lexical”} }{ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \rightarrow \left( \lambda \_ . \texttt{“lexical”} \right) \: * } \left( 1 \right) \]

\[ \frac{ }{ \left( \lambda \_ . \texttt{“lexical”} \right) \: * \rightarrow \texttt{“lexical”} } \left( 3 \right) \]

Note that all the proofs start with rule (3), since that’s the only rule without an antecedent.

### Semantics for dynamic scoping

Now that we have the tools to describe the semantics of a simple functional programming language, we can return to the main objective: to formalize dynamic scoping in the lambda calculus.

With the lexical scoping semantics, rule (3) handles applications by substituting the argument for the variable in the body of the abstraction (without evaluating the body). This will not work for dynamic scoping, since name resolution depends on the call stack and cannot be done statically. We will need to keep track of the variables that are currently in the “stack”.

Let \( \Gamma \) be a map from variables to values, and let \( \Gamma \oplus x : v \) denote the context formed by inserting \( x : v \) into \( \Gamma \) with replacement. Informally, \( \Gamma \) is the context that keeps track of the variables in the call stack and the values they are bound to.

Here are the reduction rules for dynamic scoping:

\[ \frac{ \Gamma \vdash t_1 \rightarrow t_1’ }{ \Gamma \vdash t_1 \: t_2 \rightarrow t_1’ \: t_2 } \left( 1 \right) \]

\[ \frac{ \Gamma \vdash t \rightarrow t’ }{ \Gamma \vdash v \: t \rightarrow v \: t’ } \left( 2 \right) \]

\[ \frac{ \Gamma \oplus x : v \vdash t \rightarrow t’ }{ \Gamma \vdash \left( \lambda x . t \right) v \rightarrow \left( \lambda x . t’ \right) v } \left( 3 \right) \]

\[ \frac{}{ \Gamma \vdash \left( \lambda x . v_1 \right) v_2 \rightarrow v_1 } \left( 4 \right) \]

\[ \frac{ x : v \in \Gamma}{ \Gamma \vdash x \rightarrow v } \left( 5 \right) \]

The first two rules are essentially the same as their lexically-scoped versions, except that they preserve the context \( \Gamma \) from antecedent to consequent. In other words, these two rules leave the stack unchanged.

Rule (3) reduces the body of an abstraction in an application. It adds \( x : v \) to the context, so that within the abstraction, the variable \( x \) is bound to \( v \).

When the body cannot be reduced any further, rule (4) eliminates the application, replacing it with the now-reduced body of the abstraction.

Rule (5) does dynamic name lookup: if a variable \( x \) is bound to a value \( v \) in the context \( \Gamma \), then \( x \) can be reduced to \( v \).

#### Example

Let’s evaluate this familiar term to see how it fares with the new semantics:

\[ \left( \lambda x . \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \right) \: \texttt{“dynamic”} \]

Here is the whole execution trace, and all relevant proofs:

\[ \frac{}{ \left\{ x : \texttt{“dynamic”} \right\} \vdash \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \rightarrow \lambda \_ . x } \left( 4 \right) \]
\[ \frac{ \left\{ x : \texttt{“dynamic”} \right\} \vdash \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \rightarrow \lambda \_ . x }{ \left\{ x : \texttt{“dynamic”} \right\} \vdash \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \rightarrow \left( \lambda \_ . x \right) \: * } \left( 1 \right) \]
\[ \frac{ \left\{ x : \texttt{“dynamic”} \right\} \vdash \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \rightarrow \left( \lambda \_ . x \right) \: * }{ \varnothing \vdash \left( \lambda x . \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \right) \: \texttt{“dynamic”} \rightarrow \left( \lambda x . \left( \lambda \_ . x \right) \: * \right) \: \texttt{“dynamic”} } \left( 3 \right) \]

\[ \frac{ x : \texttt{“dynamic”} \in \left\{ x : \texttt{“dynamic”}, \_ : * \right\} }{ \left\{ x : \texttt{“dynamic”}, \_ : * \right\} \vdash x \rightarrow \texttt{“dynamic”} } \left( 5 \right) \]
\[ \frac{ \left\{ x : \texttt{“dynamic”}, \_ : * \right\} \vdash x \rightarrow \texttt{“dynamic”} }{ \left\{ x : \texttt{“dynamic”} \right\} \vdash \left( \lambda \_ . x \right) \: * \rightarrow \texttt{“dynamic”} } \left( 3 \right) \]
\[ \frac{ \left\{ x : \texttt{“dynamic”} \right\} \vdash \left( \lambda \_ . x \right) \: * \rightarrow \texttt{“dynamic”} }{ \varnothing \vdash \left( \lambda x . \left( \lambda \_ . x \right) \: * \right) \: \texttt{“dynamic”} \rightarrow \left( \lambda x . x \right) \: \texttt{“dynamic”} } \left( 3 \right) \]

\[ \frac{ x : \texttt{“dynamic”} \in \left\{ x : \texttt{“dynamic”} \right\}}{ \left\{ x : \texttt{“dynamic”} \right\} \vdash x \rightarrow \texttt{“dynamic”} } \left( 5 \right) \]
\[ \frac{ \left\{ x : \texttt{“dynamic”} \right\} \vdash x \rightarrow \texttt{“dynamic”} }{ \varnothing \vdash \left( \lambda x . x \right) \: \texttt{“dynamic”} \rightarrow \left( \lambda x . \texttt{“dynamic”} \right) \: \texttt{“dynamic”} } \left( 3 \right) \]

\[ \frac{}{ \varnothing \vdash \left( \lambda x . \texttt{“dynamic”} \right) \: \texttt{“dynamic”} \rightarrow \texttt{“dynamic”} } \left( 4 \right) \]

So the original term evaluates to \( \texttt{"dynamic"} \), confirming that the rules give rise to dynamic scoping.

### Conclusion

I’m happy to report that the lambda calculus can be adapted to support dynamic scoping rules. Proofs in the resulting calculus are tractable, though a bit longer than those for the lexically-scoped calculus. Inspired by the notion of “context” from the inference rules of typed lambda calculi, we introduce a “stack context” for tracking the variables reachable by walking the call stack. Curiously, this technique seems to be equally capable of proving static properties of programs (e.g., type ascriptions) and managing dynamic scopes.