## Finding ambiguities in context-free grammars

January 2, 2017

Recently I was working on a parser, and I thought the grammar might be ambiguous. I looked around for a tool to help me detect ambiguities in context-free grammars, but I couldn’t find anything that worked. So I built CFG Checker!

It is only semi-decidable to determine whether an arbitrary context-free grammar is ambiguous. The best we can do is generate all derivations in a breadth-first fashion and look for two which produce the same sentential form. So that’s exactly what CFG Checker does. If the input grammar is ambiguous, CFG Checker will eventually find a minimal ambiguous sentential form. If the grammar is unambiguous, CFG Checker will either reach this conclusion or loop forever.

Here’s how it works. You specify the grammar as a series of production rules:

expression = number | sum sum = expression + expression

Then you run CFG Checker on it:

$ cfg-checker example.cfg .... Found a sentential form with two different derivations: expression + expression + expression Derivation 1: 0: expression 1: sum 2: expression + expression 3: expression + sum 4: expression + expression + expression Derivation 2: 0: expression 1: sum 2: expression + expression 3: sum + expression 4: expression + expression + expression

If it doesn’t finish within a few seconds, the grammar is probably unambiguous unless you gave it some pathological input.

CFG Checker is written in C++ and has no dependencies. For more details, see the GitHub project.

## Decomposing a string into its elements

December 29, 2016

My colleague Esther proposed the following challenge: given a string, decompose it into elemental symbols from the periodic table (if possible). For example,

`Hi Esther`

becomes `H I Es Th Er`

. In general there might be no solutions, one solution, or several.I implemented it in Haskell with dynamic programming. The

`elementize`

function does all the work, using the list monad to compute all possible solutions.import Data.Char (isLetter, toLower) import Data.Function.Memoize (memoFix) import Data.List (intercalate, isPrefixOf) import System.Environment (getArgs) elements = [ "H" , "He" , "Li" , "Be" , "B" , "C" , "N" , "O" , "F" , "Ne" , "Na" , "Mg" , "Al" , "Si" , "P" , "S" , "Cl" , "Ar" , "K" , "Ca" , "Sc" , "Ti" , "V" , "Cr" , "Mn" , "Fe" , "Co" , "Ni" , "Cu" , "Zn" , "Ga" , "Ge" , "As" , "Se" , "Br" , "Kr" , "Rb" , "Sr" , "Y" , "Zr" , "Nb" , "Mo" , "Tc" , "Ru" , "Rh" , "Pd" , "Ag" , "Cd" , "In" , "Sn" , "Sb" , "Te" , "I" , "Xe" , "Cs" , "Ba" , "La" , "Ce" , "Pr" , "Nd" , "Pm" , "Sm" , "Eu" , "Gd" , "Tb" , "Dy" , "Ho" , "Er" , "Tm" , "Yb" , "Lu" , "Hf" , "Ta" , "W" , "Re" , "Os" , "Ir" , "Pt" , "Au" , "Hg" , "Tl" , "Pb" , "Bi" , "Po" , "At" , "Rn" , "Fr" , "Ra" , "Ac" , "Th" , "Pa" , "U" , "Np" , "Pu" , "Am" , "Cm" , "Bk" , "Cf" , "Es" , "Fm" , "Md" , "No" , "Lr" , "Rf" , "Db" , "Sg" , "Bh" , "Hs" , "Mt" , "Ds" , "Rg" , "Cn" , "Uut", "Uuq", "Uup", "Uuh", "Uus" , "Uuo" ] elementize = memoFix (\fn str -> if null str then return [] else do element <- filterPrefixes elements str soln <- fn $ drop (length element) str return $ element : soln) where toLowerStr = map toLower filterPrefixes prefixes str = filter (\x -> isPrefixOf (toLowerStr x) (toLowerStr str)) prefixes main = do args <- getArgs putStrLn $ intercalate"\n" $ map (intercalate " ") (elementize $ filter isLetter $ concat args)

To compile the program, you need the memoize package. Once compiled, it can be run like this:

$ ./elementizer hi esther H I Es Th Er

**Fun fact:**the longest word (in the macOS words file) that can be decomposed into elements is

*nonrepresentationalism*, and it has 4 decompositions:

`No N Re P Re Se N Ta Ti O Na Li Sm`

`N O N Re P Re Se N Ta Ti O Na Li Sm`

`No N Re P Re Se N Ta Ti O N Al I Sm`

`N O N Re P Re Se N Ta Ti O N Al I Sm`

## The Di Vergent Code

December 24, 2016

### Prologue

Renowned NSA cryptographer Jacques Saunière hurried toward the east exit of the OPS2A building at Fort Meade. As he lunged for the door, the corridor went dark like it sensed his attempted escape. His face was dimly lit by the blue glow of his iPhone, with Mrs. Saunière’s number already dialed. The building was on lockdown. Saunière was trapped inside.

He heard the click of a pistol’s hammer locking into place, followed by a voice. “Put away the phone. Who knows about the oracle?”

“I told you already,” the mathematician stammered as he turned to face his killer. “I don’t know what you’re talking about!”

“You’ve made a big mistake, Saunière. The exploit will be released tonight. The world will see what happens when the NSA puts backdoors in their algorithms. Your citizens will never trust their government again.”

The cryptographer reached for his pocket and pulled out a knife. There wasn’t much time.

### Chapter 1

Robert Langdon awoke to a clamorous knock on the front door of his Airbnb. “Professor Langdon?” the visitor squalled from outside. “I need to talk to you. Can I come in?”

Langdon groaned.

*Does this happen with every Airbnb?*He glanced down at a crumpled flyer on the bedside table, reminding him how his life ended up here.CARNEGIE MELLON UNIVERSITY

proudly presents

an evening with Robert Langdon

Professor of Symbolic Systems, Stanford University

“Professor?” the voice continued. The drowsy computer scientist surrendered. The visitor made herself at home.

“My name is Sophie Neveu. I work for the NSA.”

Langdon greeted her. NSA agents aren’t his typical cup of tea, but this one seemed benign.

“Your life is in danger. What do you know about this symbol?” Neveu produced a grotesque photograph taken at NSA headquarters, depicting a lifeless Saunière with a strange-looking letter carved into his chest.

“Jesus! Who did this to him?”

“No, Professor, you misunderstand. Yes, Saunière was murdered, but he carved this symbol into his own body. We found a message draft on his phone:

*Find Robert Langdon.*”“That symbol is the Greek letter lambda. An icon of the ancient Church of Alonzo,” Langdon explained.

“The Church of Alonzo?”

“The Church, if it even still exists, preserves an ancient language rumored to give the clergy a miraculous power: the ability to construct new worlds. Entire universes, not bound by the laws of physics as we know them today. That language was called the

*lambda calculus*”.“Yes—the lambda calculus! That’s what I need to talk to you about. We have reason to believe there is a group of black hat insurgents with a computational oracle, some kind of impossibly powerful computer. They’re planning to use it to break the cryptographic algorithms that secure the Internet.” Neveu’s anxiety was contagious.

“What do we know about this oracle?”

“It’s fast—faster than any computer we’ve seen. But it only understands one language—”

“—the lambda calculus?” The computer scientist interjected.

“That’s right, and we need your help to break it.”

### Chapter 2

Neveu reached for her satchel and pulled out a laptop. She opened a terminal and was greeted with the following prompt:

Welcome to the Lambda Oracle, version 1.0.0. Copyright © 1936. The Church of Alonzo. Enter terms at the prompt. $

Langdon glanced at the screen. “What is this?”

“It’s the oracle. My laptop has an SSH connection to it,” Neveu explained. “The insurgents did what they could to lock it down, but the NSA has tools for breaking into systems like this. Now, I need you to teach me that ancient language.”

Langdon recoiled. “You’re telling me you have access to a magical supercomputer with nearly infinite resources? I need to call my colleagues at Stanford—”

“No! You must not tell anyone!” Neveu bellowed.

“But this changes everything! Just imagine: we could build a massive neural network—a brain! Or we could run protein folding algorithms. We can cure diseases! This oracle can save the world!”

“—or destroy it.” Neveu didn’t share Langdon’s enthusiasm. “This computer can factor integers. You know what that means.”

“It can break cryptography,” Langdon realized.

“Exactly. Secure communications, bitcoin, vote counting, the New York Stock Exchange, all vulnerable. If we don’t stop it, this thing will catalyze the biggest financial collapse in human history. Civilization as we know it will be destroyed.” Neveu paused, as if she too just realized what this meant.

Langdon was convinced. “So how fast is this oracle?”

“Our analysts estimate it can do 10^100 beta reductions per second. A

*googol*. More than the number of atoms in the observable universe. Per second.”Langdon stood in awe.

### Chapter 3

“Teach me the ancient language,” Neveu demanded. “How does the lambda calculus work?”

“It’s remarkably simple,” Langdon explained. “It’s just functions. You know the identity function

`x ↦ x`

?”“Sure.”

Langdon entered

`λx. x`

into the terminal. “This is how you write it in the lambda calculus.” He pressed the enter key. The oracle echoed his input.Welcome to the Lambda Oracle, version 1.0.0. Copyright © 1936. The Church of Alonzo. Enter terms at the prompt. $ λx. x Result: λx. x

Neveu looked puzzled. “What are the inputs and outputs of functions in the lambda calculus?”

“The inputs and outputs are also functions. It’s functions all the way down,” Langdon clarified.

“How do you do computation then?”

“You can apply a function to an argument,” Langdon continued, grabbing a pen and notebook. “In classical mathematics, you would write

`f(x)`

. In the lambda calculus, we just write `f x`

.”“How do you compute

`f x`

?”“You evaluate expressions with

*beta reduction*. You substitute the argument for the function’s formal parameter.*Plug and chug*, as they say. You keep doing that until no more reductions are possible. Consider the identity function. What happens if you apply it to itself? You just get the identity function back.” Langdon demonstrated his claim at the prompt.$ (λx. x) λx. x Result: λx. x

Neveu was astounded by the elegance and simplicity of this language. “That’s it?”

“That’s it,” Langdon confirmed. “Apparently the oracle can do that 10^100 times per second.”

“What about functions with multiple parameters?” Neveu’s curiosity was insatiable.

“In a higher-order system like the lambda calculus, you don’t need functions with multiple parameters. You can simulate them. Here is a function of one argument which returns a function of another argument, and that function applies the first argument to the second.”

$ λf. λx. f x Result: λf. λx. f x

“It’s like a single function that takes two arguments

`f`

and `x`

,” Langdon continued. “In fact, with a sacred ritual called Church encoding, you can express all kinds of things in this simple language: numbers, booleans, trees, etc. It’s rumored that the Church of Alonzo even found a way to do recursion.”“What’s so hard about recursion?” Neveu wondered. “Can’t you just write

`f = λx. f x`

?”“That’s the tricky part. The lambda calculus is such a simple language, it doesn’t even have

`=`

. But, according to legend, the Church has a special expression that implemented general recursion. They called it the Y combinator. No one knows how it was constructed.”In that moment, Neveu realized their next move. “We have to find out how they did it. We’re going to hit the oracle with a denial of service attack and stop the insurgents from destroying the world.”

### Chapter 4

Langdon offered Neveu a bottle of Soylent, interrupting her daydream of re-discovering the legendary Y combinator. The algae in the drink triggered a complicated and poorly-understood metabolic pathway from her stomach to her brain, leading to a moment of brilliance.

“Professor, what if there was an expression which, when reduced, results in the same expression?”

“So you could reduce it again and again, ad infinitum?” Langdon was catching on.

“Exactly. Look at this function.” Neveu scribbled

`λx. x x`

into Langdon’s notebook. “It takes `x`

and produces `x x`

. In a sense, it duplicates its input. What if we used it to duplicate itself?”“A mathematical copy machine!”

“Exactly.” Neveu reached for the keyboard and entered

`(λx. x x) λx. x x`

. Something strange happened.$ (λx. x x) λx. x x (λx. x x) λx. x x (λx. x x) λx. x x (λx. x x) λx. x x (λx. x x) λx. x x ...

“What is going on?” Langdon asked.

“The oracle keeps reducing this expression, but every reduction results in the original expression again. It’s stuck in an infinite loop!” Neveu exclaimed.

Langdon’s phone began to ring.

### Chapter 5

“Professor Langdon, this is NSA Director Bezu Fache. Do not react to this call. You are in grave danger. Sophie is lying to you.”

“Who is it?” Neveu asked.

“A colleague from Stanford. Something about the faculty lunch schedule. I’ll just be a moment.” Langdon returned to his call.

Fache continued, “Sophie is trying to use you to derive the Y combinator. She’s part of the insurgent group. You have to stop her!”

“How?”

“Derive it yourself before she does. Use it to destroy the oracle.” Fache ended the call.

Langdon looked up. Neveu was gone; she must have figured out what that call was really about. Time was running out.

He pondered Neveu’s mathematical copy machine.

*It diverges, but it’s useless. It doesn’t help us compute factorial, Fibonacci, or any other recursive function.*Now it was his turn to be brilliant.### Chapter 6

(λx. x x) λx. x x

Langdon stared at Neveu’s fascinating but useless discovery as he drank the Soylent.

“What if you wrap the

`x x`

in a function call? So instead of `x x`

, we have `f (x x)`

,” he spoke to himself—aloud—because it’s not weird to talk to yourself if it helps you think. “We could choose some function `f`

to be called at every level of recursion.” He picked up his notebook and wrote:λf. (λx. f (x x)) (λx. f (x x))

*What does this function do? What happens when you apply it to some*Langdon wondered.

`g`

?(λf. (λx. f (x x)) (λx. f (x x))) g = (λx. g (x x)) (λx. g (x x)) = g ((λx. g (x x)) (λx. g (x x)))

Langdon had just discovered

`Y`

, with the remarkable property that `Y g = g (Y g)`

for any `g`

. “What if `g`

decides not to use its argument?” he pondered. “Then, in a language with lazy evaluation, the recursion would terminate!”Dear reader, let’s take a moment to understand what Professor Langdon just discovered. Suppose

`g`

is a curried function of two arguments (i.e., it takes the first argument and returns a function which takes the second argument). What happens when you compute `Y g`

? You get back `g`

, except the first argument has already been supplied. So you actually get a modified version of `g`

that only takes the second argument. And what was given for the first argument? The modified `g`

! So `g`

can use its first argument to call itself.Hoping the Church of Alonzo would forgive him for mixing a little Python into their sacred calculus, Langdon wrote out the factorial function:

Y λf. λn. 1 if n == 0 else n * f (n - 1) end

His phone was ringing again.

### Chapter 7

“Professor Langdon, I have good news and bad news.” It was Fache.

“Bad news first.”

“Our analysts discovered the oracle has pre-emptive multitasking. So a divergent computation isn’t enough to destroy it,” Fache explained.

“What’s the good news?”

“It doesn’t have a garbage collector. Use the Y combinator to eat up all of its memory and stop Neveu from causing a global catastrophe.”

“You don’t need the Y combinator for that. You just need some divergent term that isn’t tail recursive.

`(λx. x x x) λx. x x x`

will do,” Langdon pointed out.“Brilliant.” Fache ended the call without saying goodbye. Typical Fache.

## Formalizing dynamic scoping

November 26, 2016

The lambda calculus is a minimal functional programming language. We use it to study type systems, evaluation strategies, proof calculi, and other topics in programming languages and theoretical computer science. A recent discussion led me to wonder: can we use it to study scoping rules as well?

### Lexical vs. dynamic scoping

Most programming languages have

*lexical scoping*. This means scopes are determined statically by the program structure. Name resolution happens by looking up the identifier first in the current block, then in the parent block, then in the grandparent block, etc. With*dynamic scoping*, scopes are defined by stack frames. Names are looked up in the current function first, then in the caller, then in the caller’s caller, etc. This can only happen at runtime, because the call stack doesn’t exist statically.Consider the following JavaScript expression:

(function(x) { return (function(x) { return function() { return x; }; })('lexical')(); })('dynamic')

As you might have guessed, this expression tells you what scoping strategy the programming language uses. Specifically, it evaluates to

`'lexical'`

, because JavaScript uses lexical scoping. Here is the same expression written in Emacs Lisp:((lambda (x) (funcall ((lambda (x) (lambda () x ) ) "lexical")) ) "dynamic")

This evaluates to

`"dynamic"`

, because Emacs Lisp has dynamic scoping.### The lambda calculus

The lambda calculus, like JavaScript, uses lexical scoping—as we will show! Below is a quick introduction to this versatile little programming language, in case you haven’t seen it before. Here is a more in-depth tutorial for curious readers.

#### Syntax

The syntax is quite spartan. A term (“term” is lambda-speak for “expression”) takes one of three forms:

**Variable:**\( x \)**Abstraction:**\( \lambda x . t \) (where \( t \) extends as far right as possible)**Application:**\( t_1 \: t_2 \) (left-associative)

From these three building blocks, you can construct bigger terms like this curious one:

\[ \lambda f . \left( \lambda x . f \left( x \: x \right) \right) \left( \lambda x . f \left( x \: x \right) \right) \]

This term is known as the Y combinator, and it has the interesting property that it can be used to implement unbounded recursion. Let’s save that for another time.

#### Semantics

The operational semantics tells us how to evaluate a term:

\[ \frac{ t_1 \rightarrow t_1’ }{ t_1 \: t_2 \rightarrow t_1’ \: t_2 } \left( 1 \right) \]

\[ \frac{ t \rightarrow t’ }{ v \: t \rightarrow v \: t’ } \left( 2 \right) \]

\[ \frac{ }{ \left( \lambda x . t \right) \: v \rightarrow t \left[ v/x \right] } \left( 3 \right) \]

If you’ve never seen this notation before, here’s a quick explanation. Each rule reads like an if-then statement. The part above the line is the

*antecedent*(the “if” part). The part below the line is the*consequent*(the “then” part). If the antecedent is true, then we can conclude the consequent is true also.Rule (1) states: if some term \( t_1 \) reduces to \( t_1' \), then the application \( t_1 \: t_2 \) reduces to \( t_1' \: t_2 \). This just means when we are applying some term \( t_1 \) to another term \( t_2 \), we can try to reduce \( t_1 \) before doing the application.

Rule (2) states: if some term \( t \) reduces to \( t' \), then \( v \: t \) reduces to \( v \: t' \). The \( v \) means that the first term in the application is a

*value*, i.e., an abstraction. We have this restriction to force a particular evaluation strategy (call-by-value). Of course, other evaluation strategies are possible too.Rule (3) tells us how function application works. Note that there is no antecedent, which means we can conclude the consequent unconditionally. The consequent states that if we have some application \( \left( \lambda x . t \right) v \), then we can reduce it by substituting \( v \) for free occurrences of \( x \) in \( t \). Special care must be taken to ensure the substitution is done in a capture-avoiding manner.

#### Example

Remember that magical expression that tells us what scoping rule the language uses? Here it is written as a term in the lambda calculus (where \( \_ \) is some fresh variable and \( * \) is any term, needed only because abstractions in the lambda calculus must take an argument):

\[ \left( \lambda x . \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \right) \: \texttt{“dynamic”} \]

Convince yourself that this term matches the JavaScript and Emacs Lisp expressions above. How do we begin reducing it? First, we note that it’s an application. Both rules (1) and (3) are meant for applications. We see that the left term of the application is a value, so we apply rule (3). The term reduces to:

\[ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \]

Because \( x \) was not free in the body of the abstraction, no substitution was necessary and \( \texttt{"dynamic"} \) disappeared entirely. So already we know the lambda calculus uses lexical scoping! Let’s continue anyway.

This term might look hard to parse, but remember that application is left-associative. So the left side is \( \left( \lambda x . \lambda \_ . x \right) \: \texttt{"lexical"} \) and the right side is \( * \). Since the left-side of the application is not a value (it can be reduced), we apply rule (1):

\[ \frac{ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \rightarrow t_1’ }{ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \rightarrow t_1’ \: * } \left( 1 \right) \]

But what is \( t_1' \)? We use rule (3) to find it:

\[ \frac{ }{ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \rightarrow \lambda \_ . \texttt{“lexical”} } \left( 3 \right) \]

Okay, so \( t_1' \) is \( \lambda \_ . \texttt{"lexical"} \). We are left with:

\[ \left( \lambda \_ . \texttt{“lexical”} \right) \: * \]

Applying rule (3) one last time gives:

\[ \texttt{“lexical”} \]

So the lambda calculus, as we’ve presented here, uses lexical scoping. Here is the complete execution trace and relevant proofs:

\[ \frac{ }{ \left( \lambda x . \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \right) \: \texttt{“dynamic”} \rightarrow \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * } \left( 3 \right) \]

\[ \frac{ }{ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \rightarrow \lambda \_ . \texttt{“lexical”} } \left( 3 \right) \]
\[ \frac{ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \rightarrow \lambda \_ . \texttt{“lexical”} }{ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \rightarrow \left( \lambda \_ . \texttt{“lexical”} \right) \: * } \left( 1 \right) \]

\[ \frac{ }{ \left( \lambda \_ . \texttt{“lexical”} \right) \: * \rightarrow \texttt{“lexical”} } \left( 3 \right) \]

Note that all the proofs start with rule (3), since that’s the only rule without an antecedent.

### Semantics for dynamic scoping

Now that we have the tools to describe the semantics of a simple functional programming language, we can return to the main objective: to formalize dynamic scoping in the lambda calculus.

With the lexical scoping semantics, rule (3) handles applications by substituting the argument for the variable in the body of the abstraction (without evaluating the body). This will not work for dynamic scoping, since name resolution depends on the call stack and cannot be done statically. We will need to keep track of the variables that are currently in the “stack”.

Let \( \Gamma \) be a map from variables to values, and let \( \Gamma \oplus x : v \) denote the context formed by inserting \( x : v \) into \( \Gamma \) with replacement. Informally, \( \Gamma \) is the context that keeps track of the variables in the call stack and the values they are bound to.

Here are the reduction rules for dynamic scoping:

\[ \frac{ \Gamma \vdash t_1 \rightarrow t_1’ }{ \Gamma \vdash t_1 \: t_2 \rightarrow t_1’ \: t_2 } \left( 1 \right) \]

\[ \frac{ \Gamma \vdash t \rightarrow t’ }{ \Gamma \vdash v \: t \rightarrow v \: t’ } \left( 2 \right) \]

\[ \frac{ \Gamma \oplus x : v \vdash t \rightarrow t’ }{ \Gamma \vdash \left( \lambda x . t \right) v \rightarrow \left( \lambda x . t’ \right) v } \left( 3 \right) \]

\[ \frac{}{ \Gamma \vdash \left( \lambda x . v_1 \right) v_2 \rightarrow v_1 } \left( 4 \right) \]

\[ \frac{ x : v \in \Gamma}{ \Gamma \vdash x \rightarrow v } \left( 5 \right) \]

The first two rules are essentially the same as their lexically-scoped versions, except that they preserve the context \( \Gamma \) from antecedent to consequent. In other words, these two rules leave the stack unchanged.

Rule (3) reduces the body of an abstraction in an application. It adds \( x : v \) to the context, so that within the abstraction, the variable \( x \) is bound to \( v \).

When the body cannot be reduced any further, rule (4) eliminates the application, replacing it with the now-reduced body of the abstraction.

Rule (5) does dynamic name lookup: if a variable \( x \) is bound to a value \( v \) in the context \( \Gamma \), then \( x \) can be reduced to \( v \).

#### Example

Let’s evaluate this familiar term to see how it fares with the new semantics:

\[ \left( \lambda x . \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \right) \: \texttt{“dynamic”} \]

Here is the whole execution trace, and all relevant proofs:

\[ \frac{}{ \left\{ x : \texttt{“dynamic”} \right\} \vdash \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \rightarrow \lambda \_ . x } \left( 4 \right) \]
\[ \frac{ \left\{ x : \texttt{“dynamic”} \right\} \vdash \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \rightarrow \lambda \_ . x }{ \left\{ x : \texttt{“dynamic”} \right\} \vdash \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \rightarrow \left( \lambda \_ . x \right) \: * } \left( 1 \right) \]
\[ \frac{ \left\{ x : \texttt{“dynamic”} \right\} \vdash \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \rightarrow \left( \lambda \_ . x \right) \: * }{ \varnothing \vdash \left( \lambda x . \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \right) \: \texttt{“dynamic”} \rightarrow \left( \lambda x . \left( \lambda \_ . x \right) \: * \right) \: \texttt{“dynamic”} } \left( 3 \right) \]

\[ \frac{ x : \texttt{“dynamic”} \in \left\{ x : \texttt{“dynamic”}, \_ : * \right\} }{ \left\{ x : \texttt{“dynamic”}, \_ : * \right\} \vdash x \rightarrow \texttt{“dynamic”} } \left( 5 \right) \]
\[ \frac{ \left\{ x : \texttt{“dynamic”}, \_ : * \right\} \vdash x \rightarrow \texttt{“dynamic”} }{ \left\{ x : \texttt{“dynamic”} \right\} \vdash \left( \lambda \_ . x \right) \: * \rightarrow \texttt{“dynamic”} } \left( 3 \right) \]
\[ \frac{ \left\{ x : \texttt{“dynamic”} \right\} \vdash \left( \lambda \_ . x \right) \: * \rightarrow \texttt{“dynamic”} }{ \varnothing \vdash \left( \lambda x . \left( \lambda \_ . x \right) \: * \right) \: \texttt{“dynamic”} \rightarrow \left( \lambda x . x \right) \: \texttt{“dynamic”} } \left( 3 \right) \]

\[ \frac{ x : \texttt{“dynamic”} \in \left\{ x : \texttt{“dynamic”} \right\}}{ \left\{ x : \texttt{“dynamic”} \right\} \vdash x \rightarrow \texttt{“dynamic”} } \left( 5 \right) \]
\[ \frac{ \left\{ x : \texttt{“dynamic”} \right\} \vdash x \rightarrow \texttt{“dynamic”} }{ \varnothing \vdash \left( \lambda x . x \right) \: \texttt{“dynamic”} \rightarrow \left( \lambda x . \texttt{“dynamic”} \right) \: \texttt{“dynamic”} } \left( 3 \right) \]

\[ \frac{}{ \varnothing \vdash \left( \lambda x . \texttt{“dynamic”} \right) \: \texttt{“dynamic”} \rightarrow \texttt{“dynamic”} } \left( 4 \right) \]

So the original term evaluates to \( \texttt{"dynamic"} \), confirming that the rules give rise to dynamic scoping.

### Conclusion

I’m happy to report that the lambda calculus can be adapted to support dynamic scoping rules. Proofs in the resulting calculus are tractable, though a bit longer than those for the lexically-scoped calculus. Inspired by the notion of “context” from the inference rules of typed lambda calculi, we introduce a “stack context” for tracking the variables reachable by walking the call stack. Curiously, this technique seems to be equally capable of proving static properties of programs (e.g., type ascriptions) and managing dynamic scopes.

## Higher-rank and higher-kinded types

April 23, 2016

If you enjoy statically-typed functional programming, you’ve probably heard of higher-rank and higher-kinded types; otherwise, perhaps you might like a quick primer? 😊

### Background: what is parametric polymorphism?

You may be familiar with “generics” in Java, or “templates” in C++. \(^1\) is the jargon that characterizes these features. It allows us to abstract types from values, analogous to how ordinary functions abstract values from values. For example, consider this function which chooses one of two strings at random:

*Parametric polymorphism*String chooseString(String head, String tail) { if (Math.random() < 0.5) { return head; } else { return tail; } }

We might want to choose between values of other types too, so we abstract

`String`

away from this method as a generic type parameter `T`

:<T> T chooseValue(T head, T tail) { if (Math.random() < 0.5) { return head; } else { return tail; } }

Now we can compute

`chooseValue("heads", "tails")`

or `chooseValue(0, 1)`

or apply values of any other type, as long as both arguments have the same type. That’s the essence of parametric polymorphism.### Higher-rank types

Parametric polymorphism allows for “functions” from types to values. In the example above,

`chooseValue`

is a function which (implicitly) takes a type `T`

, and then two arguments of type `T`

, and returns a `T`

.You may know that in most modern programming languages, functions are first-class values—they can be stored in variables, passed to functions, etc. In most statically-typed programming languages, however, polymorphic functions are not first-class. You cannot pass

`chooseValue`

to another function or store it in a variable, and then apply it polymorphically later on. Whenever you want to refer to it, you must specify up front (explicitly or implicitly) a type for `T`

, thereby converting it into a monomorphic function.The idea of higher-rank types is to make polymorphic functions first-class, just like regular (monomorphic) functions.\(^2\) Here’s a contrived example that requires higher-rank types, and is thus not valid Java:

String chooseStringWeird( <T> BiFunction<T, T, T> randomChoice, String head, String tail ) { if (randomChoice(true, false)) { return randomChoice(head, tail); } else { return randomChoice(tail, head); } } String bestEditor() { return chooseStringWeird(chooseValue, "emacs", "vim"); }

Higher-rank types don’t come up very often, and supporting them has the unfortunate consequence of making complete type inference undecidable.\(^3\) Even Haskell, the proverbial playground for popular polymorphism paradigms, requires a language extension to support them.

### Higher-kinded types

If parametric polymorphism involves functions from types to values, you might wonder about functions from types to types. These type-level functions are called

*type operators*. An example of a type operator in Java is`Stack`

:Stack<String> names;

`Stack`

can be thought of as a function that takes a type (in this case, `String`

) and returns a type. With higher-kinded types, type operators are first-class types, similar to how higher-rank types treat polymorphic functions as first class values. In other words, type operators can be parameterized by other type operators. Java doesn’t support higher-kinded types, but, if it did, you would be able to write something like this:class GraphSearch<T> { T<Node> frontier; }

`GraphSearch`

is a type operator, and its parameter `T`

stands for another type operator such as `Stack`

or `Queue`

:GraphSearch<Stack> depthFirstSearch; GraphSearch<Queue> breadthFirstSearch;

The “type” of a type is called a

*kind*. The kind of an ordinary (“proper”) type is usually written`*`

, and type constructors (unary type operators) have kind `* -> *`

. `GraphSearch`

has kind `(* -> *) -> *`

. Types such as `GraphSearch`

are called *higher-kinded*because they are higher-order functions on the level of types, and they are “typed” using kinds.Unlike higher-rank types, higher-kinded types are commonplace in typed functional programming.

`Functor`

and `Monad`

are examples of higher-kinded polymorphism in Haskell. `Monad`

in particular is a simple idea, but has a reputation for being difficult to understand for non-functional programmers. I suspect part of this confusion is due to the fact that `Monad`

is higher-kinded, which is an idea most programmers are not accustomed to. Maybe this article will help! 😊### Conclusion

In summary, parametric polymorphism introduces functions from types to values, and type operators are functions from types to types. When these functions are “first-class” in the sense that they can be used in a higher-order fashion, they are called

*higher-rank*and*higher-kinded*, respectively.There is one possibility we haven’t explored: functions from values to types. These are called “dependent types” and open up a mind-blowing world of programming with proofs. Dependent types are outside the scope of this article, but they are 100% worth learning about if you are interested in type theory.

Speaking of further learning, I highly recommend Benjamin Pierce’s book

*Types and Programming Languages*, which is, in my opinion, lightyears ahead of any other introductory resource on type systems. Chapter 23 introduces parametric polymorphism with higher-rank types (System F), and Chapter 29 discusses higher-kinded types (System \(\lambda_{\omega}\)).I hope that was helpful! Let me know in the comments if anything could be explained better.

### Footnotes

[1] The term “polymorphism” is autological. In addition to parametric polymorphism, it can also refer to subtype polymorphism or ad hoc polymorphism.

[2] This kind of polymorphism is also called

[3] This negative result is from J.B. Wells, 1999.

[2] This kind of polymorphism is also called

*rank-n polymorphism*or*first-class polymorphism*.[3] This negative result is from J.B. Wells, 1999.