## Super quick intro to monads

February 28, 2014

Monads tend to come up a lot in functional programming. All you need to know about functional programming for now is that it’s a paradigm in which mutable data is generally discouraged, or even forbidden. Computation generally happens by composing pure functions together. A pure function is one that does nothing but return the result of a computation. It doesn’t print anything, store anything, ask for user input, etc. It just manipulates the data you pass it and returns more data.

You may be more familiar with object-oriented programming. Let me translate monads into OOP-speak.

`Monad`

is an interface, or an abstract class if you like. It has a constructor and a `bind`

method:class Monad: def __init__(self, x): raise NotImplementedError def bind(self, f): raise NotImplementedError

`x`

can be anything. `f`

is a function that takes something of the same type as `x`

(presumably from the monad) and returns a new monad. Lastly, `bind`

also returns a new monad (often from the result of `f`

). Literally that’s it. Maybe you don’t know what it’s used for, but at least now you’ve seen the interface.Typically, a monad acts as some sort of container, and

`bind`

applies a function to what it contains. Let’s make a monad that stores a single element:class SimpleMonad(Monad): def __init__(self, x): self.x = x def bind(self, f): return f(self.x)

To make this a little more concrete, this is how you use it:

def double(x): return SimpleMonad(x * 2) def add1(x): return SimpleMonad(x + 1) foo = SimpleMonad(5).bind(double).bind(add1)

Okay, so you can chain functions together with

`bind`

. But why would we do that? Equivalently, we could have just written `add1(double(5))`

, which is much shorter.The main idea is that monads don’t have to just call the functions you pass to

`bind`

. They can change the rules. For example, here is a monad that allows you to chain operations without worrying about null pointer errors.class MaybeMonad(Monad): def __init__(self, x): self.x = x def bind(self, f): if self.x is None: return self else: return f(self.x)

If any intermediate computation in a sequence of operations results in

`None`

, the entire expression becomes `None`

and no further operations are performed (much like how `NaN`

works in floating-point arithmetic, if you’re familiar with that). If you spend more time with monads, you’ll find that you can also use them to implement exception handling, mutable state, concurrency, and a bunch of other things.Remember: functional programming is all about writing programs by composing functions together.

*The power of monads is that they allow you to change the rules for how functions are composed.*When I was learning about monads, the main things I was confused about were:

- It seems like using monads requires you to write a lot of code, e.g.,
`SimpleMonad(5).bind(double).bind(add1)`

instead of`add1(double(5))`

. Do people really go through all this trouble? - How do I get the value
*out*of the monad? Can I access it like`monad.x`

? - I hear that Haskell doesn’t let you write “impure” functions that have side effects, but I know for a fact that Haskell programs can print to the screen and write to files. What’s the deal? Do monads somehow resolve this?

Well, I know the answers now, so let me share the love:

- Haskell has a special syntax for monads and Python doesn’t—so that’s why they don’t look very nice in Python. It’s actually quite brilliant. Check it out if you have time. Using it feels very natural and it looks a lot like imperative code.
- Some monads provide a “getter” function that allows you to get the value back out. But a monad is not required to have one. For some monads, it doesn’t make sense to take a value out of it. A monad is not required to act like a container. See #3.
- Yep—all functions in Haskell are pure. Haskell has a neat trick for doing I/O. You can think of a Haskell program as a pure computation that returns a tree-like data structure, and this tree represents another program that does impure things.Now, building this tree can be done in a purely functional way—as is required in Haskell. But when you “run” a Haskell program, two things happen: 1) your pure Haskell computation is executed, resulting in this tree-like data structure, 2) the tree-program from step 1 (which can print to the screen and write to files) is also executed.It just so happens that a certain monad, especially when combined with Haskell’s special monad syntax, makes it easy to build this tree. This monad is called
`IO`

in Haskell. The bind operation does not call the function you pass it, as it did in our monad examples above. Rather, it just stores it in the tree. The runtime will call it when interpreting the tree. That’s the high-level idea. In this post, I implement this idea in Python, which will probably make it a little more clear for you.Lastly, the`IO`

monad does not have a “getter” function. For example, if you have an`IO`

action that asks a user for a string of input, you cannot extract the string from the monad. This is because the`IO`

action doesn’t actually have a string to give you—it only represents a computation that returns a string. When you`bind`

a function`f`

to this monad, you’re building a new`IO`

action that asks the user for a string and then calls`f`

on it.

To summarize, monads let you change what it means to compose functions together. You might want to keep track of some extra information, store the functions in a tree, call them in a different order, call them multiple times, skip some functions entirely (based on the value being passed to them), etc.

You don’t have to use them, but now you don’t have to fear them either!

*EDIT: Fixed a typo and added the*

`MaybeMonad`

example at /u/wargotad’s suggestion.## Region-based memory management

October 19, 2013

If I were to design a programming language, it probably would be very different from mainstream languages. Every day new wacky ideas about programming languages come to me—it’s an unhealthy obsession. Here’s what I was thinking about today.

### Manual memory management

Before I spill the beans, let me give you some context. In the beginning, there was manual memory management, like in this C example:

// allocate a kilobyte of memory char* buffer = malloc(1024); ... // free the memory free(buffer);

For every call to

`malloc`

, you need a corresponding call to `free`

to release the memory back to the operating system. This is an annoying requirement, and sometimes software engineers forget to call `free`

, resulting in memory leaks.### Garbage collection

So a famous computer scientist named John McCarthy invented the garbage collector, which is a special program that periodically frees memory when you’re done with it so you don’t have to. So, for example, here is what the above example would look like in Java, a programming language with a garbage collector:

// allocate a kilobyte of memory byte[] buffer = new byte[1024]; ... // the memory is automatically released

Notice we didn’t have to deallocate the memory—but this comes at a cost. Garbage collectors can incur noticeable performance hits, especially if they are poorly written. For example, my cell phone runs Android, and I frequently notice my phone lock up for a few hundred milliseconds while the garbage collector reclaims unused memory. Applications that have real-time requirements (e.g., audio DSPs, games, operating systems, pacemakers, autopilots, etc.) typically do not use garbage collection, because the degradation in performance or lack of predictability is unacceptable.

### Resource Acquisition Is Initialization

So do we have to revert to C-style manual memory management for these kinds of programs? C++ has a different solution to this problem, called Resource Acquisition Is Initialization (RAII). The idea is that objects allocate whatever memory they need in their constructor and release that memory in their destructor, which automatically runs when the object goes out of scope. Check this out:

// allocate a kilobyte of memory vector<char> buffer(1024); ... // the memory is automatically released

It looks a lot like garbage collection, right? But here the memory is released exactly when

`buffer`

goes out of scope. So instead of a garbage collector which runs at unpredictable times and frees a large number of objects at once (which is slow), objects are automatically released one at a time, in a deterministic, predictable order, as they would be if you wrote it in C.### Limitations of RAII

In RAII, heap-allocated memory is tied to the lifetime of an object on the stack. But if we’re doing that, why don’t we just allocate all memory on the stack? There are two issues with that:

- Maybe the stack isn’t big enough. A typical stack size is 8 megabytes. What if I want a gigabyte?
- Maybe I want the memory to outlive the current stack frame, e.g., I want to return it from a function.

RAII is a good solution to

*Issue 1*. Does it also fix*Issue 2*? Not really—the whole point of RAII is that the memory is released automatically when the function returns. So if we held onto that memory after the function returned, when would it get released? C++ has some mechanisms (e.g., smart pointers, copy constructors, move semantics, etc.) to let you transfer ownership of that memory from the function call to the surrounding scope. But in my opinion, these non-orthogonal concepts are inelegant hacks that are only necessary because the language is poorly-designed (and overly complicated!), and they remind me too much of manual memory management. But what concerns me more is that using these workarounds is optional—the possibility of memory leaks still exists. What I want is a system where a) heap allocations are possible, b) memory leaks are fundamentally impossible (in the sense that we can easily pinpoint exactly where every piece of memory is released), and c) there is no unpredictable performance penalty due to a garbage collector running in the background. It’s easy to design a system with any two of these properties, but can we have all three?### Region-based memory management

Consider this code in a Java-like (call-by-sharing) language that does not have garbage collection:

int[] compute_big_array() { int[] data = new int[1048576]; ... return data; } void foobar() { int[] big_array = compute_big_array(); ... delete [] big_array; }

Without garbage collection, it is the engineer’s responsibility to release

`big_array`

at the end of `foobar`

. We want to return the array from `compute_big_array`

, so vanilla RAII can’t help us here.RAII ties heap-allocated memory to lexical scopes. But often times memory should be shared between multiple, unrelated lexical scopes. In the example above, an RAII-like approach to memory management would free

`data`

when `compute_big_array`

returns (perhaps after making a copy of it). But we want the original memory to be available to `foobar`

after `compute_big_array`

returns. So this memory shouldn’t be bound to the lexical scope of `compute_big_array`

, but it also shouldn’t be tied to the lexical scope of `foobar`

.Notice, however, that we only need the memory to exist during the execution of

`foobar`

. But this may include calls to other functions defined elsewhere (in unrelated lexical scopes), and we may want to make the memory available to those functions as well. To use the domain lingo, we want the memory to be available in the *dynamic scope*of`foobar`

. Can we design a system that ties the memory to this dynamic (not necessarily lexical) scope, guaranteeing that it will be freed when the program execution leaves that scope?Let’s come up with a new strategy that allows us to specify a dynamic scope for each memory allocation which dictates the lifetime of that memory.

int[] compute_big_array(MemoryAllocator<int> allocator) { int[] data = allocator(1048576); ... return data; } void foobar() { MemoryAllocator<int> allocator = makeAllocator<int>(); int[] big_array = compute_big_array(allocator); ... }

Here’s how it works. The language gives you a parametrically-polymorphic

`makeAllocator`

function, that you can use to create a memory allocator (and the lifetime of this allocator is called a *region*). If you want to allocate some memory, you have to use an allocator. When an allocator goes out of scope (i.e., the program leaves the region), its destructor frees any memory that it allocated. Every variable goes out of scope at some point, so any memory you allocate is guaranteed to be freed. But unlike garbage collection, there is no extra performance penalty! It may be a little cumbersome to pass around memory allocators, but for some applications the strong safety guarantees make it worth it.### Case study: a web server

Web servers are long-lived programs that serve web pages to clients, and it’s not uncommon for a web server to run continuously for months or longer. It is crucial that web servers do not leak memory, because otherwise they would quickly consume all available resources and crash. With region-based memory management, we can be sure that no memory is leaked from one request to the next, which is a stronger guarantee that most garbage collectors provide. Consider a simple single-threaded web server:

void main() { while (true) { MemoryAllocator<int> allocator = makeAllocator<int>(); // listen on port 80 and handle requests, using allocator as necessary } }

When the server is done processing a request, the allocator goes out of scope and all memory allocated for that request is deallocated. With most garbage collectors, you have no guarantees about when the collector will run and what it will free. So a) our web sever will have a tiny memory footprint, b) it is guaranteed not to leak memory from one request to the next, and c) it does not have the performance cost or unpredictability of a garbage collector.

### Conclusion

Here are some cool things about region-based memory management:

- Regions cannot leak their memory to the outside world. All heap allocations are performed inside regions, so no memory is ever leaked.
- Often, when functions in C return pointers to memory, it’s not clear who is responsible for deallocating it. Systems based on regions do not have this problem: if a function allocates and returns memory, you have to provide it with an allocator that is responsible for this memory.
- There is no performance penalty due to a garbage collector running in the background. Memory allocation is deterministic and predictable.
- Writing a good garbage collector is hard. Writing a good region-based allocator is easy.

However, there are a couple of disadvantages as well:

- Region-based memory management requires more bookkeeping (you have to declare regions and keep track of memory allocators). This can be alleviated by having functions declare implicit regions around their bodies, and allocating memory to those regions by default.
- Sometimes, the lifetime of objects does not tie well to scopes—neither lexical nor dynamic. For example, consider a web server that needs some data to persist between requests, but that data grows and some of it needs to be freed occasionally (a memory pool might be a good solution to this). Another example is data that is shared between threads of a program.

## Let there be light!

May 27, 2013

I usually write about computer science, engineering, or music, but today I’m going to digress and talk about physics. In particular, I’ll explain how Maxwell’s equations predict the existence of electromagnetic waves.

I will assume you have a basic familiarity with vector calculus and partial differential equations, but no knowledge of electrodynamics.

### Electric and magnetic fields

There are two important vector fields you need to know about: the electric field \( \mathbf{E} \) and the magnetic field \( \mathbf{B} \). What is the significance of these two fields? Mathematically, they describe the forces that are exerted on charged particles due to the charges of other particles. This is called the Lorentz force, and is given by:

\[ \mathbf{F} = q \left( \mathbf{E} + \mathbf{v} \times \mathbf{B} \right) \]

Here, \( \mathbf{v} \) is the velocity of the particle. Of course, this velocity depends on the choice of reference frame. Due to Faraday’s Law and Ampère’s Law (both described below), we’ll see that the choice of reference frame doesn’t actually matter (phew!). The factor \( q \) is the charge of the particle. For an electron, \( q \approx -1.602 \times 10^{-19} \) Coulombs. Notice that this value is negative, so electrons tend to accelerate in the opposite direction of the electric field.

### Maxwell’s equations

Maxwell’s equations relate electric fields, magnetic fields, charges, and current. They come in various forms—here I’ll describe them in their microscopic, differential form. We’ll start with Gauss’s Law:

\[ \boldsymbol{\nabla} \cdot \mathbf{E} = \frac{ \rho }{ \epsilon_0 } \]

This states that for any point in space, the divergence of the electric field \( \mathbf{E} \) is proportional to the charge density \( \rho \) (the constant factor \( \epsilon_0 \) is called the permittivity of free space). More simply, Gauss’s Law tells us that charges create electric fields. There is also a similar law for the magnetic field \( \mathbf{B} \):

\[ \boldsymbol{\nabla} \cdot \mathbf{B} = 0 \]

In other words, magnetic “charges” do not exist; magnetic field lines are always loops. Speaking of magnetic fields, the next law states that, in addition to charges, a time-varying magnetic field can also affect the electric field:

\[ \boldsymbol{\nabla} \times \mathbf{E} = - \frac{ \partial \mathbf{B} }{ \partial t } \]

This is called Faraday’s Law. For example, if I pick an intertial reference frame \( X \) with a stationary electron and move a nearby magnet (which generates a magnetic field) through space, the magnetic field will be time-varying, so an electric field is induced due to Faraday’s Law. Let’s say this electric field imparts a Lorentz force \( f \) on the electron. From the reference frame \( Y \) of the magnet, the magnetic field appears stationary but the electron is moving and will feel the same Lorentz force \( f \) due to the second term in the Lorentz force. So the choice of reference frame does not matter.

The final equation is Ampère’s Law:

\[ \boldsymbol{\nabla} \times \mathbf{B} = \mu_0 \left( \mathbf{J} + \epsilon_0 \frac{ \partial \mathbf{E} }{ \partial t } \right) \]

The constant \( \mu_0 \) is called the permeability of free space. \( \mathbf{J} \) is the current density, which describes how current is locally flowing at a point in space. In English, Ampère’s Law states that a magnetic field is generated by electric current or by a time-varying electric field (the latter term is called “Maxwell’s correction”). Just as with Faraday’s Law, the derivative term will depend on the choice of reference frame, but in the end the Lorentz force always comes out to be the same.

Amazingly, from these simple laws, you can derive all of classical electrodynamics!

### The Helmholtz wave equation

In a vacuum, there is no charge or current density. Then we can reduce Maxwell’s equations to:

\[ \boldsymbol{\nabla} \cdot \mathbf{E} = 0 \]
\[ \boldsymbol{\nabla} \cdot \mathbf{B} = 0 \]
\[ \boldsymbol{\nabla} \times \mathbf{E} = - \frac{ \partial \mathbf{B} }{ \partial t } \]
\[ \boldsymbol{\nabla} \times \mathbf{B} = \mu_0 \epsilon_0 \frac{ \partial \mathbf{E} }{ \partial t } \]

Before we continue, we need the following vector calculus identity:

\[ \boldsymbol{\nabla} \times \left( \boldsymbol{\nabla} \times \mathbf{E} \right) = \boldsymbol{\nabla} \left( \boldsymbol{\nabla} \cdot \mathbf{E} \right) - \boldsymbol{\nabla}^2 \mathbf{E} \]

In other words, the curl of the curl is equal to the gradient of the divergence minus the vector Laplacian (the proof follows directly from the definition of the vector Laplacian). For this identity, I chose the letter \( \mathbf{E} \) because we will be taking the curl of the curl of the electric field:

\[ \boldsymbol{\nabla} \times \left( \boldsymbol{\nabla} \times \mathbf{E} \right) \]

But first, let’s take the curl of Faraday’s Law to get this in terms of the magnetic field:

\[ \boldsymbol{\nabla} \times \left( \boldsymbol{\nabla} \times \mathbf{E} \right) = - \frac{ \partial \left( \boldsymbol{\nabla} \times \mathbf{B} \right) }{ \partial t } \]

Now we apply our vector identity to the left-hand side:

\[ \boldsymbol{\nabla} \left( \boldsymbol{\nabla} \cdot \mathbf{E} \right) - \boldsymbol{\nabla}^2 \mathbf{E} = - \frac{ \partial \left( \boldsymbol{\nabla} \times \mathbf{B} \right) }{ \partial t } \]

By Gauss’s Law in a vacuum, we can eliminate the leftmost term:

\[ \boldsymbol{\nabla}^2 \mathbf{E} = \frac{ \partial \left( \boldsymbol{\nabla} \times \mathbf{B} \right) }{ \partial t } \]

Now we can use Ampère’s Law in a vacuum on the right-hand side:

\[ \boldsymbol{\nabla}^2 \mathbf{E} = \mu_0 \epsilon_0 \frac{ \partial^2 \mathbf{E} }{ \partial t^2 } \]

This is called the

*Helmholtz wave equation*. It states that the second spatial derivative (the vector Laplacian) of the electric field is proportional to the second time derivative, where the proportionality constant is \( \mu_0 \epsilon_0 \).### Electromagnetic waves

In a classical setting (no relativistic or quantum effects), Maxwell’s equations always hold—and we’ve just shown how they imply the Helmholtz wave equation in a vacuum. Let’s solve the wave equation. Once we have a solution, we’ll check that it agrees with Maxwell’s equations (not all do).

One solution is \( \mathbf{E} = \mathbf{0} \) (which implies \( \mathbf{B} = \mathbf{0} \)), but it’s a bit prosaic to call that a

*wave*. Still, a quick look at Maxwell’s equations reveals that this solution does not violate the laws of classical electrodynamics, so we can have \( \mathbf{E} = \mathbf{B} = \mathbf{0} \) in a vacuum. In other words, darkness is physically possible, which is great because otherwise I would have difficulty sleeping at night.The nice thing about Maxwell’s equations is that they are

*linear*in \( \mathbf{E} \) and \( \mathbf{B} \). In other words, if we find multiple solutions to these equations, their superposition is also a solution. This fact is very useful in harmonic analysis. In particular, we can represent almost all functions as the superposition of basic sinusoidal waves with the Fourier series. Conveniently, the form of the Helmholtz wave equation suggests a sinusoid might be a solution. If it works and agrees with Maxwell’s equations, then the superposition principle allows us to predict the existence of all kinds of electromagnetic waves.With that in mind, let’s try a solution of the form \( \mathbf{E} = \hat{x} E_0 \cos \left( \omega t - k z - \phi \right) \), for some unknown \( E_0 \) (amplitude), \( \omega \) (frequency), \( k \) (wavenumber), and \( \phi \) (phase shift). We say that this wave is \( \hat{x} \)-polarized and propagates in the \( \hat{z} \) direction.

First, we’ll try it with the wave equation:

\[ \boldsymbol{\nabla}^2 \mathbf{E} = \mu_0 \epsilon_0 \frac{ \partial^2 \mathbf{E} }{ \partial t^2 } \]
\[ \boldsymbol{\nabla}^2 \left( \hat{x} E_0 \cos \left( \omega t - k z - \phi \right) \right) = \mu_0 \epsilon_0 \frac{ \partial^2 }{ \partial t^2 } \left( \hat{x} E_0 \cos \left( \omega t - k z - \phi \right) \right) \]
\[ - \hat{x} E_0 k^2 \cos \left( \omega t - k z - \phi \right) = - \hat{x} \mu_0 \epsilon_0 E_0 \omega^2 \cos \left( \omega t - k z - \phi \right) \]
\[ k^2 = \mu_0 \epsilon_0 \omega^2 \]
\[ \frac{k}{\omega} = \sqrt{\mu_0 \epsilon_0} \]

So \( \mathbf{E} = \hat{x} E_0 \cos \left( \omega t - k z - \phi \right) \) is a solution to the wave equation as long as \( \frac{k}{\omega} = \sqrt{\mu_0 \epsilon_0} \). The quotient of the frequency \( \omega \) and the wavenumber \( k \) is the speed \( c \) of the wave, so we can see that in a vacuum \(c = \frac{\omega}{k} = \frac{1}{\sqrt{ \mu_0 \epsilon_0 }} \). In SI units, that gives 299,792,458 m/s, which seems just right.

Let’s test our solution against Maxwell’s equations in a vacuum. Taking the divergence of \( \mathbf{E} \) yields \( 0 \) because the wave is \( \hat{x} \)-polarized but has no dependence on \( \hat{x} \). Therefore Gauss’s Law is satisfied. The rest of Maxwell’s equations involve \( \mathbf{B} \), so we must find a \( \mathbf{B} \) that works with our trial solution \( \mathbf{E} \).

Let’s start with Faraday’s law:

\[ \boldsymbol{\nabla} \times \mathbf{E} = - \frac{ \partial \mathbf{B} }{ \partial t } \]
\[ \boldsymbol{\nabla} \times \left( \hat{x} E_0 \cos \left( \omega t - k z - \phi \right) \right) = - \frac{ \partial \mathbf{B} }{ \partial t } \]
\[ \hat{y} E_0 k \sin \left( \omega t - k z - \phi \right) = - \frac{ \partial \mathbf{B} }{ \partial t } \]

This tells us that the time derivative of \( \mathbf{B} \) is a sinusoid polarized in the \( \hat{y} \) direction. Therefore:

\[ \mathbf{B} = \hat{y} E_0 \frac{k}{\omega} \cos \left( \omega t - k z - \phi \right) + \mathbf{C} \]

The vector \( \mathbf{C} \) is the constant of integration. From the Helmholtz equation we saw that \( \frac{k}{\omega} = \sqrt{\mu_0 \epsilon_0} \), so we can rewrite \( \mathbf{B} \) as:

\[ \mathbf{B} = \hat{y} E_0 \sqrt{\mu_0 \epsilon_0} \cos \left( \omega t - k z - \phi \right) + \mathbf{C} \]

Now let’s test \( \mathbf{E} \) and \( \mathbf{B} \) against Ampère’s Law:

\[ \boldsymbol{\nabla} \times \mathbf{B} = \mu_0 \epsilon_0 \frac{ \partial \mathbf{E} }{ \partial t } \]
\[ \boldsymbol{\nabla} \times \left( \hat{y} E_0 \sqrt{\mu_0 \epsilon_0} \cos \left( \omega t - k z - \phi \right) + \mathbf{C} \right) = \mu_0 \epsilon_0 \frac{ \partial }{ \partial t } \left( \hat{x} E_0 \cos \left( \omega t - k z - \phi \right) \right) \]
\[ - \hat{x} E_0 k \sqrt{\mu_0 \epsilon_0} \sin \left( \omega t - k z - \phi \right) = - \hat{x} \mu_0 \epsilon_0 \omega E_0 \sin \left( \omega t - k z - \phi \right) \]
\[ \frac{k}{\omega} = \sqrt{ \mu_0 \epsilon_0 } \]

This is the same constraint that we got from the Helmholtz equation (so it works). Finally, we check Gauss’s Law for Magnetism:

\[ \boldsymbol{\nabla} \cdot \mathbf{B} = 0 \]
\[ \boldsymbol{\nabla} \cdot \left( \hat{y} E_0 \frac{k}{\omega} \cos \left( \omega t - k z - \phi \right) + \mathbf{C} \right) = 0 \]
\[ 0 = 0 \]

So our sinusoidal \( \mathbf{E} \) and \( \mathbf{B} \) fields agree with all of Maxwell’s equations. Since \( E_0 \), \( \omega \), and \( \phi \) were arbitrary (and we saw that \( k \) depends on \( \omega \)), any amplitude, frequency, and phase shift is valid (but the wave always has constant velocity \(c = \frac{1}{\sqrt{ \mu_0 \epsilon_0 }} \) in a vacuum). By the superposition principle, we can take any linear combination of waves of this form without violating Maxwell’s equations. This means, at the very least, any wave which can be decomposed into basic sinusoids (almost every wave) is valid. Furthermore, the polarization of the wave \( \hat{x} \) and the direction of propagation \( \hat{z} \) were also chosen arbitrarily—Maxwell’s equations do not have a preference, as long as they are perpendicular.

So Maxwell’s equations predict that a lot of electromagnetic waves are possible. Let’s experimentally verify our findings—oh, you’ve done that by reading this article!

## Highest note on a guitar

December 29, 2012

I just bought a new guitar, and I asked myself:

*what is the highest note you can play in standard tuning?*When plucked, the high E string vibrates at \(440 \times 2^{-\frac{5}{12}}\) Hz, or approximately 329.628 Hz.

If we pluck this string while fingering the innermost fret (assuming there are 24 frets), we hit the E a major 18th above middle C, with frequency \(440 \times 2^{\frac{19}{12}}\) Hz, or approximately 1318.510 Hz. We can also place our finger just past this fret and hit the next F or maybe even the F# before we run out of fretboard.

However, we can also pluck the string on the other side of the fret to get even higher notes. The highest note we can produce this way happens when we pluck the high E string on the far side of the first fret, with the finger pressing the string against this fret from the inward side.

This note will be higher than any audible harmonics you will be able to produce. Let’s calculate its pitch. To do this, we need to know the width of the space between the nut and the first fret (the string is effectively this long when we press it against the first fret from the other side).

Halving the length of a string raises its pitch by an octave. This generalizes, of course, to arbitrary intervals and length ratios. If a string is length \(x\), we can raise its pitch by \(n\) semitones by reducing its length to \(\left( \frac{1}{2} \right)^{\frac{n}{12}} x\).

Using this we can calculate the width of the first segment of the fretboard. Each segment of the fretboard constitutes a semitone in pitch. If we want to raise the high E string (of length \(x\)) by a semitone, we neet to shorten it to \(\left( \frac{1}{2} \right)^{\frac{1}{12}} x\), which is about \(0.944x\). This means the piece we cut off, which is the part we’re will be playing, has length \(\left( 1 - \left( \frac{1}{2} \right)^{\frac{1}{2}} \right) x\), or about \(0.056x\).

This means we changed the length of the string by a factor of about 0.056. We can use the formula above to calculate how many semitones by which we have increased. Solving \(1 - \left( \frac{1}{2} \right)^{\frac{1}{2}} = \left( \frac{1}{2} \right)^{\frac{n}{12}}\) for \(n\), we find that we have gone up about \(n = 49.862\) semitones (about 4 octaves and a major 2nd) from the E, which brings us approximately to a really high F#, a tritone above the highest note on a piano. The frequency of this note is exactly

\[ 440 \times 2 ^ { \left( 12 \log \left( 1 - \left( \frac{1}{2} \right) ^ { \frac{1}{12} } \right) / \log \left( \frac{1}{2} \right) - 5 \right) / 12 } \text{ Hz,} \]

which is roughly 5.873 kHz. Some adults will not be able to hear this note.

As my collegue Ben Sena points out, you can hit even higher notes by pressing the string against two frets and plucking in between. You can pluck the highest note then by plucking between the innermost two frets on the high E string. Let’s calculate the frequency of this note, assuming the guitar has 24 frets. If the string has length \(x\), then the length of this piece is \( \left( \left( \frac{1}{2} \right) ^ { \frac{23}{12} } - \left( \frac{1}{2} \right) ^ { \frac{24}{12} } \right) x \), which is about \( \left( 1.487 \times 10^{-2} \right) x\). Using the method we used above, we solve for \(n\) in the equation \( \left( \frac{1}{2} \right) ^ { \frac{23}{12} } - \left( \frac{1}{2} \right) ^ { \frac{24}{12} } = \left( \frac{1}{2} \right) ^ { \frac{n}{12} } \). We find that \( n = 12 \log \left( \left( \frac{1}{2} \right) ^ { \frac{23}{12} } - \left( \frac{1}{2} \right) ^ { \frac{24}{12} } \right) / \log \frac{1}{2} \), which is about 72.862 semitones, raising the high E string by 6 octaves and almost a minor 2nd. This means that, assuming a 24 fret quitar, the highest note you can produce is approximately an F. The frequency of this note is exactly

\[ 440 \times 2 ^ { \left( 7 + 12 \log \left( \left( \frac{1}{2} \right) ^ { \frac{23}{12} } - \left( 1 / 2 \right) ^ { \frac{24}{12} } \right) / \log \left( \frac{1}{2} \right) \right) / 12 } \text{ Hz,} \]

which is about

**44.347 kHz**. You will not be able to hear this note!This analysis uses a very simple model of guitar. We assume that the strings are perfectly elastic and ignore strain forces, air resistance, and other factors that are likely to affect the sound in these extreme conditions.

**TLDR:**F, with some waving of hands.

## Algebraic data types

August 21, 2012

### Primitive types

A type system starts with some primitive types. Some examples include:

`Int`

(the type of integers, perhaps limited to some range)`Real`

(the type of real numbers, perhaps represented as IEEE 754 floating-point numbers)`Char`

(the type of Unicode code points or ASCII characters)`Bool`

(logical`true`

and`false`

)

Any good programming language will let us construct compound types from simple ones like these. Let’s examine some of these operations, which are called type constructors.

### Product types

Most programmers will be familiar with product types, but in programming they are more commonly called

*tuples*(the name “tuple” is a generalization of the terms “double”, “triple”, etc.). A product type is a compound type that is formed by a sequence of types, and is commonly denoted`(T1, T2, ..., Tn)`

or `T1 × T2 × ... × Tn`

. In set theory parlance, tuples correspond to cartesian products, and that is why they’re called “product types.”As an example, we could represent a point in 3-space by a Cartesian triple:

`(Real, Real, Real)`

.Most programming languages will allow the elements of a tuple to be named (rather than indexed). This has the same type theoretic meaning, but named tuples are often called

*records*,*structs*, or*classes*, depending on the programming language and context.*Note:*In type theory, we usually do not distinguish between a 1-tuple (called a “singleton”) and the type of its element (however, most programming languages do make this distinction). This means that, for example,

`(Int)`

is usually equivalent to `Int`

in the literature.A special case of tuples is the 0-tuple, which is also called the unit type. The unit type can only hold one value (which means it stores zero information). It is sometimes called

`Void`

or `Unit`

, and its value is often called `null`

, `nil`

, `none`

, or `nothing`

.Common properties of the type product constructor:

- Closure: the product of two types is a type
- Associativity (sometimes):
`(A × B) × C = A × (B × C)`

- Identity element:
`Unit`

Depending on the type system, types and their constructors can form various algebraic structures, such as a monoid.

### Sum types

Sum types, also called

*tagged unions*,*disjoint unions*, or*variants*, are used to describe values that could be of one of several types (the corresponding set theoretic operation is union of disjoint sets). We usually write sum types as`T1 + T2 + .. + Tn`

, `T1 | T2 | ... | Tn`

, or `T1 ∪ T2 ∪ ... ∪ Tn`

.Many programmers often use sum types without knowing it. For example, what is the return type of the following Python function?

def f(flag): return 5 if flag else "Hello"

It seems that

`f`

could return an `int`

or a `str`

, so its type must be `int + str`

.As a special case, if we take the sum of zero types, we get a special type called

`Bottom`

or `⊥`

, which can take on no values. A function which does not return (i.e., does not halt) is said to have a return type of `Bottom`

. Likewise, the sum of all types is called `Top`

or `⊤`

, and it is sometimes called `Any`

, `Object`

, or `General`

, because it can take on any value.Note that, unlike product types, sum types can be defined inductively, as we shall see with lists and trees.

Properties of the union constructor:

- Closure: the union of two types is a type
- Associativity (sometimes):
`(A + B) + C = A + (B + C)`

- Commutativity:
`A + B = B + A`

- Identity element:
`Bottom`

You can see that types and the sum operation might form a commutative monoid, depending on whether

`+`

is associative.### Algebraic data types

We can represent a number of different data structures using

*algebraic data types*, which are simply sums of products of types. For example, we can represent a list of integers asIntList = Unit + Int × IntList

where the product operation (

`×`

) has a higher precedence than the sum operation (`+`

), and `Unit`

is the unit type (which represents an empty list here). In English, this means: *a list of integers is either the empty list or an integer paired with a list of integers.*Using a nominative type system, we can also represent enumerated types as the union of disjoint unit types:

Clubs = Unit Diamonds = Unit Hearts = Unit Spades = Unit Suit = Clubs + Diamonds + Hearts + Spades

This only works in a nominative type system because, for example,

`Clubs`

and `Diamonds`

need to be disjoint unit types or else the their union would still be the unit type (in type theory we talk about *the*unit type, but in a nominative type system we can have several unit types). In a structural type system, we would have`Suit = Clubs = Diamonds = Hearts = Spades`

, which is not what we want.We might define a binary tree of integers like this:

IntTree = Unit + Int × IntTree × IntTree

This reads:

*a binary tree can be the empty tree or an integer and two subtrees.*### Fun with algebra

The title of this article suggests that we can manipulate expressions involving types algebraically, as we do with numbers. To make this easier, let’s adopt the following notation:

`1`

is the unit type`T`

is a primitive type`X + Y`

is the sum of two types`X × Y`

is the product of two types (which has a higher syntactic precedence than sum)

Additionally, we can let the rest of the natural numbers represent sums of unit types (e.g.

`2 = 1 + 1 = bool`

). Repeated products can be written as exponentials (e.g. `X × X × X = X^3`

). Now let’s describe some common data structures.`1 + T`

is a nullable type (also called an option type or a maybe type)`L = 1 + T × L`

is the type of lists`B = 1 + T × B × B`

is the type of binary trees, like`IntTree`

defined above

Take a look at that second one (lists). With a bit of hand-waving (treating type addition and multiplication like the corresponding operations on reals), we can solve for

`L`

:L = 1 + T × L L - T × L = 1 L × (1 - T) = 1 L = 1 / (1 - T) L = 1 + T + T^2 + T^3 + ...

Interpreting that last line in set theoretic terms, we can translate it into English as:

*a list is either empty, or it contains one element, or two elements, or three elements, etc.*Pretty cool, huh? If we’re treating these symbols as numbers, then they correspond to the number of values that the corresponding types can take on (i.e. the sizes of the corresponding sets). For example, the unit type has 1 possible value, so in this notation, it becomes`1`

. The fact that `L = 1 + T + T^2 + T^3 + ...`

does not converge simply means that there are an infinite number of list topologies we can construct. In general, the algebraic forms of inductive datatypes do not converge.I want to mention one more thing you can do with this notation. In 1997, a computer scientist named Gérard Huet suggested that we can also take derivatives of types in an unpublished work on zipper types. Oddly, the derivative laws look very much like their classical counterparts from differential calculus! The derivative of a type describes the “one-hole context” of a type, which is a data structure that represents a pointer to an item in a larger container type. For example, one-hole contexts can be used to mark the position of an item in a linked list or a binary tree and can be used to efficiently implement such containers in a purely functional fashion.