## Formalizing dynamic scoping

November 26, 2016

The lambda calculus is a minimal functional programming language. We use it to study type systems, evaluation strategies, proof calculi, and other topics in programming languages and theoretical computer science. A recent discussion led me to wonder: can we use it to study scoping rules as well?

### Lexical vs. dynamic scoping

Most programming languages have

*lexical scoping*. This means scopes are determined statically by the program structure. Name resolution happens by looking up the identifier first in the current block, then in the parent block, then in the grandparent block, etc. With*dynamic scoping*, scopes are defined by stack frames. Names are looked up in the current function first, then in the caller, then in the caller’s caller, etc. This can only happen at runtime, because the call stack doesn’t exist statically.Consider the following JavaScript expression:

(function(x) { return (function(x) { return function() { return x; }; })('lexical')(); })('dynamic')

As you might have guessed, this expression tells you what scoping strategy the programming language uses. Specifically, it evaluates to

`'lexical'`

, because JavaScript uses lexical scoping. Here is the same expression written in Emacs Lisp:((lambda (x) (funcall ((lambda (x) (lambda () x ) ) "lexical")) ) "dynamic")

This evaluates to

`"dynamic"`

, because Emacs Lisp has dynamic scoping.### The lambda calculus

The lambda calculus, like JavaScript, uses lexical scopingâ€”as we will show! Below is a quick introduction to this versatile little programming language, in case you haven’t seen it before. Here is a more in-depth tutorial for curious readers.

#### Syntax

The syntax is quite spartan. A term (“term” is lambda-speak for “expression”) takes one of three forms:

**Variable:**\( x \)**Abstraction:**\( \lambda x . t \) (where \( t \) extends as far right as possible)**Application:**\( t_1 \: t_2 \) (left-associative)

From these three building blocks, you can construct bigger terms like this curious one:

\[ \lambda f . \left( \lambda x . f \left( x \: x \right) \right) \left( \lambda x . f \left( x \: x \right) \right) \]

This term is known as the Y combinator, and it has the interesting property that it can be used to implement unbounded recursion. Let’s save that for another time.

#### Semantics

The operational semantics tells us how to evaluate a term:

\[ \frac{ t_1 \rightarrow t_1’ }{ t_1 \: t_2 \rightarrow t_1’ \: t_2 } \left( 1 \right) \]

\[ \frac{ t \rightarrow t’ }{ v \: t \rightarrow v \: t’ } \left( 2 \right) \]

\[ \frac{ }{ \left( \lambda x . t \right) \: v \rightarrow t \left[ v/x \right] } \left( 3 \right) \]

If you’ve never seen this notation before, here’s a quick explanation. Each rule reads like an if-then statement. The part above the line is the

*antecedent*(the “if” part). The part below the line is the*consequent*(the “then” part). If the antecedent is true, then we can conclude the consequent is true also.Rule (1) states: if some term \( t_1 \) reduces to \( t_1' \), then the application \( t_1 \: t_2 \) reduces to \( t_1' \: t_2 \). This just means when we are applying some term \( t_1 \) to another term \( t_2 \), we can try to reduce \( t_1 \) before doing the application.

Rule (2) states: if some term \( t \) reduces to \( t' \), then \( v \: t \) reduces to \( v \: t' \). The \( v \) means that the first term in the application is a

*value*, i.e., an abstraction. We have this restriction to force a particular evaluation strategy (call-by-value). Of course, other evaluation strategies are possible too.Rule (3) tells us how function application works. Note that there is no antecedent, which means we can conclude the consequent unconditionally. The consequent states that if we have some application \( \left( \lambda x . t \right) v \), then we can reduce it by substituting \( v \) for free occurrences of \( x \) in \( t \). Special care must be taken to ensure the substitution is done in a capture-avoiding manner.

#### Example

Remember that magical expression that tells us what scoping rule the language uses? Here it is written as a term in the lambda calculus (where \( \_ \) is some fresh variable and \( * \) is any term, needed only because abstractions in the lambda calculus must take an argument):

\[ \left( \lambda x . \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \right) \: \texttt{“dynamic”} \]

Convince yourself that this term matches the JavaScript and Emacs Lisp expressions above. How do we begin reducing it? First, we note that it’s an application. Both rules (1) and (3) are meant for applications. We see that the left term of the application is a value, so we apply rule (3). The term reduces to:

\[ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \]

Because \( x \) was not free in the body of the abstraction, no substitution was necessary and \( \texttt{"dynamic"} \) disappeared entirely. So already we know the lambda calculus uses lexical scoping! Let’s continue anyway.

This term might look hard to parse, but remember that application is left-associative. So the left side is \( \left( \lambda x . \lambda \_ . x \right) \: \texttt{"lexical"} \) and the right side is \( * \). Since the left-side of the application is not a value (it can be reduced), we apply rule (1):

\[ \frac{ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \rightarrow t_1’ }{ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \rightarrow t_1’ \: * } \left( 1 \right) \]

But what is \( t_1' \)? We use rule (3) to find it:

\[ \frac{ }{ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \rightarrow \lambda \_ . \texttt{“lexical”} } \left( 3 \right) \]

Okay, so \( t_1' \) is \( \lambda \_ . \texttt{"lexical"} \). We are left with:

\[ \left( \lambda \_ . \texttt{“lexical”} \right) \: * \]

Applying rule (3) one last time gives:

\[ \texttt{“lexical”} \]

So the lambda calculus, as we’ve presented here, uses lexical scoping. Here is the complete execution trace and relevant proofs:

\[ \frac{ }{ \left( \lambda x . \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \right) \: \texttt{“dynamic”} \rightarrow \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * } \left( 3 \right) \]

\[ \frac{ }{ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \rightarrow \lambda \_ . \texttt{“lexical”} } \left( 3 \right) \]
\[ \frac{ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \rightarrow \lambda \_ . \texttt{“lexical”} }{ \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \rightarrow \left( \lambda \_ . \texttt{“lexical”} \right) \: * } \left( 1 \right) \]

\[ \frac{ }{ \left( \lambda \_ . \texttt{“lexical”} \right) \: * \rightarrow \texttt{“lexical”} } \left( 3 \right) \]

Note that all the proofs start with rule (3), since that’s the only rule without an antecedent.

### Semantics for dynamic scoping

Now that we have the tools to describe the semantics of a simple functional programming language, we can return to the main objective: to formalize dynamic scoping in the lambda calculus.

With the lexical scoping semantics, rule (3) handles applications by substituting the argument for the variable in the body of the abstraction (without evaluating the body). This will not work for dynamic scoping, since name resolution depends on the call stack and cannot be done statically. We will need to keep track of the variables that are currently in the “stack”.

Let \( \Gamma \) be a map from variables to values, and let \( \Gamma \oplus x : v \) denote the context formed by inserting \( x : v \) into \( \Gamma \) with replacement. Informally, \( \Gamma \) is the context that keeps track of the variables in the call stack and the values they are bound to.

Here are the reduction rules for dynamic scoping:

\[ \frac{ \Gamma \vdash t_1 \rightarrow t_1’ }{ \Gamma \vdash t_1 \: t_2 \rightarrow t_1’ \: t_2 } \left( 1 \right) \]

\[ \frac{ \Gamma \vdash t \rightarrow t’ }{ \Gamma \vdash v \: t \rightarrow v \: t’ } \left( 2 \right) \]

\[ \frac{ \Gamma \oplus x : v \vdash t \rightarrow t’ }{ \Gamma \vdash \left( \lambda x . t \right) v \rightarrow \left( \lambda x . t’ \right) v } \left( 3 \right) \]

\[ \frac{}{ \Gamma \vdash \left( \lambda x . v_1 \right) v_2 \rightarrow v_1 } \left( 4 \right) \]

\[ \frac{ x : v \in \Gamma}{ \Gamma \vdash x \rightarrow v } \left( 5 \right) \]

The first two rules are essentially the same as their lexically-scoped versions, except that they preserve the context \( \Gamma \) from antecedent to consequent. In other words, these two rules leave the stack unchanged.

Rule (3) reduces the body of an abstraction in an application. It adds \( x : v \) to the context, so that within the abstraction, the variable \( x \) is bound to \( v \).

When the body cannot be reduced any further, rule (4) eliminates the application, replacing it with the now-reduced body of the abstraction.

Rule (5) does dynamic name lookup: if a variable \( x \) is bound to a value \( v \) in the context \( \Gamma \), then \( x \) can be reduced to \( v \).

#### Example

Let’s evaluate this familiar term to see how it fares with the new semantics:

\[ \left( \lambda x . \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \right) \: \texttt{“dynamic”} \]

Here is the whole execution trace, and all relevant proofs:

\[ \frac{}{ \left\{ x : \texttt{“dynamic”} \right\} \vdash \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \rightarrow \lambda \_ . x } \left( 4 \right) \]
\[ \frac{ \left\{ x : \texttt{“dynamic”} \right\} \vdash \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \rightarrow \lambda \_ . x }{ \left\{ x : \texttt{“dynamic”} \right\} \vdash \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \rightarrow \left( \lambda \_ . x \right) \: * } \left( 1 \right) \]
\[ \frac{ \left\{ x : \texttt{“dynamic”} \right\} \vdash \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \rightarrow \left( \lambda \_ . x \right) \: * }{ \varnothing \vdash \left( \lambda x . \left( \lambda x . \lambda \_ . x \right) \: \texttt{“lexical”} \: * \right) \: \texttt{“dynamic”} \rightarrow \left( \lambda x . \left( \lambda \_ . x \right) \: * \right) \: \texttt{“dynamic”} } \left( 3 \right) \]

\[ \frac{ x : \texttt{“dynamic”} \in \left\{ x : \texttt{“dynamic”}, \_ : * \right\} }{ \left\{ x : \texttt{“dynamic”}, \_ : * \right\} \vdash x \rightarrow \texttt{“dynamic”} } \left( 5 \right) \]
\[ \frac{ \left\{ x : \texttt{“dynamic”}, \_ : * \right\} \vdash x \rightarrow \texttt{“dynamic”} }{ \left\{ x : \texttt{“dynamic”} \right\} \vdash \left( \lambda \_ . x \right) \: * \rightarrow \texttt{“dynamic”} } \left( 3 \right) \]
\[ \frac{ \left\{ x : \texttt{“dynamic”} \right\} \vdash \left( \lambda \_ . x \right) \: * \rightarrow \texttt{“dynamic”} }{ \varnothing \vdash \left( \lambda x . \left( \lambda \_ . x \right) \: * \right) \: \texttt{“dynamic”} \rightarrow \left( \lambda x . x \right) \: \texttt{“dynamic”} } \left( 3 \right) \]

\[ \frac{ x : \texttt{“dynamic”} \in \left\{ x : \texttt{“dynamic”} \right\}}{ \left\{ x : \texttt{“dynamic”} \right\} \vdash x \rightarrow \texttt{“dynamic”} } \left( 5 \right) \]
\[ \frac{ \left\{ x : \texttt{“dynamic”} \right\} \vdash x \rightarrow \texttt{“dynamic”} }{ \varnothing \vdash \left( \lambda x . x \right) \: \texttt{“dynamic”} \rightarrow \left( \lambda x . \texttt{“dynamic”} \right) \: \texttt{“dynamic”} } \left( 3 \right) \]

\[ \frac{}{ \varnothing \vdash \left( \lambda x . \texttt{“dynamic”} \right) \: \texttt{“dynamic”} \rightarrow \texttt{“dynamic”} } \left( 4 \right) \]

So the original term evaluates to \( \texttt{"dynamic"} \), confirming that the rules give rise to dynamic scoping.

### Conclusion

I’m happy to report that the lambda calculus can be adapted to support dynamic scoping rules. Proofs in the resulting calculus are tractable, though a bit longer than those for the lexically-scoped calculus. Inspired by the notion of “context” from the inference rules of typed lambda calculi, we introduce a “stack context” for tracking the variables reachable by walking the call stack. Curiously, this technique seems to be equally capable of proving static properties of programs (e.g., type ascriptions) and managing dynamic scopes.

## Higher-rank and higher-kinded types

April 23, 2016

If you enjoy statically-typed functional programming, you’ve probably heard of higher-rank and higher-kinded types; otherwise, perhaps you might like a quick primer? ðŸ˜Š

### Background: what is parametric polymorphism?

You may be familiar with “generics” in Java, or “templates” in C++. \(^1\) is the jargon that characterizes these features. It allows us to abstract types from values, analogous to how ordinary functions abstract values from values. For example, consider this function which chooses one of two strings at random:

*Parametric polymorphism*String chooseString(String head, String tail) { if (Math.random() < 0.5) { return head; } else { return tail; } }

We might want to choose between values of other types too, so we abstract

`String`

away from this method as a generic type parameter `T`

:<T> T chooseValue(T head, T tail) { if (Math.random() < 0.5) { return head; } else { return tail; } }

Now we can compute

`chooseValue("heads", "tails")`

or `chooseValue(0, 1)`

or apply values of any other type, as long as both arguments have the same type. That’s the essence of parametric polymorphism.### Higher-rank types

Parametric polymorphism allows for “functions” from types to values. In the example above,

`chooseValue`

is a function which (implicitly) takes a type `T`

, and then two arguments of type `T`

, and returns a `T`

.You may know that in most modern programming languages, functions are first-class valuesâ€”they can be stored in variables, passed to functions, etc. In most statically-typed programming languages, however, polymorphic functions are not first-class. You cannot pass

`chooseValue`

to another function or store it in a variable, and then apply it polymorphically later on. Whenever you want to refer to it, you must specify up front (explicitly or implicitly) a type for `T`

, thereby converting it into a monomorphic function.The idea of higher-rank types is to make polymorphic functions first-class, just like regular (monomorphic) functions.\(^2\) Here’s a contrived example that requires higher-rank types, and is thus not valid Java:

String chooseStringWeird( <T> BiFunction<T, T, T> randomChoice, String head, String tail ) { if (randomChoice(true, false)) { return randomChoice(head, tail); } else { return randomChoice(tail, head); } } String bestEditor() { return chooseStringWeird(chooseValue, "emacs", "vim"); }

Higher-rank types don’t come up very often, and supporting them has the unfortunate consequence of making complete type inference undecidable.\(^3\) Even Haskell, the proverbial playground for popular polymorphism paradigms, requires a language extension to support them.

### Higher-kinded types

If parametric polymorphism involves functions from types to values, you might wonder about functions from types to types. These type-level functions are called

*type operators*. An example of a type operator in Java is`Stack`

:Stack<String> names;

`Stack`

can be thought of as a function that takes a type (in this case, `String`

) and returns a type. With higher-kinded types, type operators are first-class types, similar to how higher-rank types treat polymorphic functions as first class values. In other words, type operators can be parameterized by other type operators. Java doesn’t support higher-kinded types, but, if it did, you would be able to write something like this:class GraphSearch<T> { T<Node> frontier; }

`GraphSearch`

is a type operator, and its parameter `T`

stands for another type operator such as `Stack`

or `Queue`

:GraphSearch<Stack> depthFirstSearch; GraphSearch<Queue> breadthFirstSearch;

The “type” of a type is called a

*kind*. The kind of an ordinary (“proper”) type is usually written`*`

, and type constructors (unary type operators) have kind `* -> *`

. `GraphSearch`

has kind `(* -> *) -> *`

. Types such as `GraphSearch`

are called *higher-kinded*because they are higher-order functions on the level of types, and they are “typed” using kinds.Unlike higher-rank types, higher-kinded types are commonplace in typed functional programming.

`Functor`

and `Monad`

are examples of higher-kinded polymorphism in Haskell. `Monad`

in particular is a simple idea, but has a reputation for being difficult to understand for non-functional programmers. I suspect part of this confusion is due to the fact that `Monad`

is higher-kinded, which is an idea most programmers are not accustomed to. Maybe this article will help! ðŸ˜Š### Conclusion

In summary, parametric polymorphism introduces functions from types to values, and type operators are functions from types to types. When these functions are “first-class” in the sense that they can be used in a higher-order fashion, they are called

*higher-rank*and*higher-kinded*, respectively.There is one possibility we haven’t explored: functions from values to types. These are called “dependent types” and open up a mind-blowing world of programming with proofs. Dependent types are outside the scope of this article, but they are 100% worth learning about if you are interested in type theory.

Speaking of further learning, I highly recommend Benjamin Pierce’s book

*Types and Programming Languages*, which is, in my opinion, lightyears ahead of any other introductory resource on type systems. Chapter 23 introduces parametric polymorphism with higher-rank types (System F), and Chapter 29 discusses higher-kinded types (System \(\lambda_{\omega}\)).I hope that was helpful! Let me know in the comments if anything could be explained better.

### Footnotes

[1] The term “polymorphism” is autological. In addition to parametric polymorphism, it can also refer to subtype polymorphism or ad hoc polymorphism.

[2] This kind of polymorphism is also called

[3] This negative result is from J.B. Wells, 1999.

[2] This kind of polymorphism is also called

*rank-n polymorphism*or*first-class polymorphism*.[3] This negative result is from J.B. Wells, 1999.

## Does Google execute JavaScript?

August 6, 2016

I’m told:

*Yes, it’s 2016; of course Google executes JavaScript.*But I’m also told:

*Server-side rendering is necessary for SEO.*If Google can run JavaScript and thereby render client-side views, why is server-side rendering necessary for SEO? Okay, Google isn’t the only search engine, but it’s obviously an important one to optimize for.

Recently I ran a simple experiment to see to what extent the Google crawler understands dynamic content. I set up a web page at doesgoogleexecutejavascript.com which does the following:

- The HTML from the server contains text which says
*“Google does not execute JavaScript.”* - There is some inline JavaScript on the page that changes the text to
*“Google executes JavaScript, but only if it is embedded in the document.”* - The HTML also links to a script which, when loaded, changes the text to
*“Google executes JavaScript, even if the script is fetched from the network. However, Google does not make AJAX requests.”* - That script makes an AJAX request and updates the text with the response from the server. The server returns the message
*“Google executes JavaScript and even makes AJAX requests.”*

After I launched this page, I linked to it from its GitHub repository and waited for Google to discover it. Within a day, I saw the following in Google search:

I waited a week to see if Google would re-crawl the page and run the JavaScriptâ€”no dice. Then I manually submitted the page to the index from the Google Search Console. Shortly thereafter, I saw this:

It seems Google is not guaranteed to run your JavaScript automatically. You may have to manually trigger a crawl. And, even then, Google apparently won’t do any AJAX requests your page may depend on, or at least it didn’t in my case.

**New Years Update (1/1/2017)**: Apparently some time after I did this experiment Google re-indexed the page and did the AJAX call this time:

Thanks pul for pointing that out.

My conclusion is: Google may or may not decide to run your JavaScript, and you don’t want your business to depend on its particular inclination of the day. Do server-side/universal/isomorphic rendering just to be safe.

## “Drag Me Down” for a cappella

May 9, 2016

I lead an a cappella ensemble at work, and we just performed my latest arrangement for the group. The song is

*Drag Me Down*, a famous pop title from last year. You can download the sheet music here. Enjoy!**UPDATE (8/22/2016):**We did a studio recording of this song! Listen here:

## Huffgram: convert strings into memorizable phrases

March 4, 2016

Have you ever needed to memorize a phone number? IP address? Hex color code? I made just the thing for you!

Huffgram will convert any piece of text into a word phrase that is easier to memorize. You might use it to memorize a phone number, a password, or even some digits of Ï€. When you need the original text back, Huffgram can recover it from the word phrase.

### How it works

Using a corpus composed of a handful of modern classic novels (e.g.,

*To Kill a Mockingbird*), I constructed a bunch of Huffman trees. Each unique word in the corpus gets its own Huffman tree, which is generated from the probability distribution over subsequent words. Thus, the Huffman trees approximate a bigram model. All of those Huffman trees (~900kb of minified JSON) are loaded by the browser. You can download the file here.Essentially, Huffgram interprets the input as compressed data and decompresses it using those Huffman trees. For each word \(w\), Huffgram “decompresses” the next word by traversing the Huffman tree for \(w\) according to the next few bits in the input. To choose the initial word, there is a Huffman tree generated by the probability distribution over words that start sentences in the corpus.

You might wonder what happens if, at the end of the input, there aren’t enough bits to reach the bottom of the last Huffman tree. We could try finishing the path in some arbitrary way (e.g., always go left) to get to a word, but then the mapping wouldn’t be invertible. So we might have some bits at the end which don’t complete a word. To resolve that, we have a special list of special end words, and we treat those final bits as an index into that list.