November 25, 2017
I’m a software engineer at a place. I like the work and the people, and I learn a lot from my teammates. Many of them work very hard, so much that they don’t enjoy programming for fun anymore. I still love recreational programming, but in a peculiar sense.
When I come home from work, I try to prove theorems in a proof assistant. Usually the theorems are related to functional programming or type systems.
It’s a masochistic hobby. Convincing a computer that a theorem is true can be quite difficult compared to convincing a human. But if I can get a computer to accept my proof, then a) I must have a pretty good understanding of it, and b) it really must be right!
I use the Coq theorem prover for formalizing proofs. Coq is a beautiful and simple language —far simpler than the languages I use at work! Its beauty is hidden beneath a rather unsightly IDE :
The proof is on the left. The green highlighted part has been verified by Coq. As I am working on a proof, I can ask Coq to step forward through it with
Ctrl+Down, and I can use
Ctrl+Upto step backward. At any point if my proof is wrong, Coq will highlight the error in red. It’s similar to how a type checker works—actually, it’s exactly the same in a very deep sense. I’d love to explain what I mean by that, but I’ll have to save that for separate article.
In the screenshot, I’m currently in the middle of a proof. The upper right pane shows what I need to prove (in this case,
f x = x) and also what hypotheses I can use to prove it (shown above the horizontal line). The lower right pane shows helpful information if I request it, such as the type of something or a list of definitions/lemmas/theorems related to it.
Coq is not a toy, although it can be fun like one. Many serious results have been proven using it, including the four color theorem and the correctness of a certain C compiler.
Recently I set up a small GitHub repository with a continuous integration system that checks my proofs when I push to a branch, preventing me from merging it into master if there is a mistake. It’s impossible to commit incorrect math to this repository !
Proving theorems in a machine-checked language is invigorating once you get the hang of it. It feels like doing a puzzle. And when you finish a long proof of a nontrivial result, it’s a remarkable sensation unlike anything else.
Case study: domain theory
When you write down a recursive function in any language, it’s actually not obvious that such a definition is mathematically well-defined. This week I was reading about the theoretical underpinning of recursive definitions: fixed points. Whenever you define something recursively, you are technically taking a fixed point of a continuous function (yes, you read that correctly!). There is some really cool math behind this called domain theory.
Before this week, I had a vague understanding of all this. I do a fair amount of functional programming, but I hardly knew anything about domain theory. Today, I finished proving the main result: the Kleene fixed-point theorem. This was a difficult proof for me, but now I consider myself pretty well-versed in the basic ideas of domain theory.
The proof is not very remarkable on paper, but convincing a computer to accept it is another matter entirely. I started by writing down the definitions below. The first step to understanding a theory is being able to formally define the main ideas! I don’t expect the casual reader to dissect all these definitions, but I include them here to give you a sense of what it looks like to write them in Coq:
(* Assumption: Let (T, leq) be a partially ordered set, or poset. A poset is a set with a binary relation which is reflexive, transitive, and antisymmetric. *) Parameter T : Set. Parameter leq : T -> T -> Prop. Axiom refl : forall x, leq x x. Axiom trans : forall x y z, leq x y -> leq y z -> leq x z. Axiom antisym : forall x y, leq x y -> leq y x -> x = y. (* A supremum of a subset of T is a least element of T which is greater than or equal to every element in the subset. This is also called a join or least upper bound. *) Definition supremum P x1 := (forall x2, P x2 -> leq x2 x1) /\ forall x3, (forall x2, P x2 -> leq x2 x3) -> leq x1 x3. (* A directed subset of T is a non-empty subset of T such that any two elements in the subset have an upper bound in the subset. *) Definition directed P := (exists x1, P x1) /\ forall x1 x2, P x1 -> P x2 -> exists x3, leq x1 x3 /\ leq x2 x3 /\ P x3. (* Assumption: Let the partial order be directed-complete. That means every directed subset has a supremum. *) Axiom directedComplete : forall P, directed P -> exists x, supremum P x. (* Assumption: Let T have a least element called bottom. This makes our partial order a pointed directed-complete partial order. *) Parameter bottom : T. Axiom bottomLeast : forall x, leq bottom x. (* A monotone function is one which preserves order. We only need to consider functions for which the domain and codomain are identical and have the same order relation, but this need not be the case for monotone functions in general. *) Definition monotone f := forall x1 x2, leq x1 x2 -> leq (f x1) (f x2). (* A function is Scott-continuous if it preserves suprema of directed subsets. We only need to consider functions for which the domain and codomain are identical and have the same order relation, but this need not be the case for continuous functions in general. *) Definition continuous (f : T -> T) := forall P x1, directed P -> supremum P x1 -> supremum (fun x2 => exists x3, P x3 /\ x2 = f x3) (f x1). (* This function performs iterated application of a function to bottom. *) Fixpoint approx f n := match n with | 0 => bottom | S n => f (approx f n) end.
Next, I proved a few lemmas. To keep the article short, I will only state them; the full proofs are here.
(* We will need this simple lemma about pairs of natural numbers. *) Lemma natDiff : forall n1 n2, exists n3, n1 = add n2 n3 \/ n2 = add n1 n3. (* The supremum of a subset of T, if it exists, is unique. *) Lemma supremumUniqueness : forall P x1 x2, supremum P x1 -> supremum P x2 -> x1 = x2. (* Scott-continuity implies monotonicity. *) Lemma continuousImpliesMonotone : forall f, continuous f -> monotone f. (* Iterated applications of a monotone function f to bottom form an ω-chain, which means they are a totally ordered subset of T. This ω-chain is called the ascending Kleene chain of f. *) Lemma omegaChain : forall f n m, monotone f -> leq (approx f n) (approx f m) \/ leq (approx f m) (approx f n). (* The ascending Kleene chain of f is directed. *) Lemma kleeneChainDirected : forall f, monotone f -> directed (fun x2 => exists n, x2 = approx f n).
To give you an idea of what an actual proof looks like in Coq, below is a proof of one of the easier lemmas above. The proof is virtually impossible to read without stepping through it interactively, so don’t worry if it doesn’t make any sense to you here.
Lemma supremumUniqueness : forall P x1 x2, supremum P x1 -> supremum P x2 -> x1 = x2. Proof. intros. unfold supremum in H; destruct H. unfold supremum in H0; destruct H0. apply H1 in H0. apply H2 in H. apply antisym; auto. Qed.
And finally, this is the main result I proved (the proof is here):
(* The Kleene fixed-point theorem states that the least fixed-point of a Scott- continuous function over a pointed directed-complete partial order exists and is the supremum of the ascending Kleene chain. *) Theorem kleene : forall f, continuous f -> exists x1, supremum (fun x2 => exists n, x2 = approx f n) x1 /\ f x1 = x1 /\ (forall x2, f x2 = x2 -> leq x1 x2).
There’s nothing better than typing
Qedat the end of a long proof and watching it turn green as Coq verifies it. What’s really amazing to me is that Stephen Kleene probably proved this without the help of a computer, but I was able to verify it with the highest possible scrutiny. It’s as if he wrote an impressive program without ever having run it, and it turned out not to have any bugs! This is more than just an analogy. When you work with Coq, you become intimately familiar with the interpretation of propositions as types and proofs as programs.
Case study: soundness of a type system
A successful formula for writing papers about programming languages theory is the following:
- Define the syntax of a small but novel language (usually based on the lambda calculus).
- Define a type system for the language (a set of rules that specify which programs are legal).
- Define a semantics for the language (i.e., describe what happens when you run a program).
- Prove that the type system prevents programs from “getting stuck” according to the semantics.
The theorem that is proven in that last step is called soundness. A few months ago, I wanted to try doing a formally-verified soundness proof.
The simply-typed lambda calculus is essentially the smallest programming language with higher-order functions and a meaningful type system . I started by defining the abstract syntax of the language in Coq:
Inductive term : Set := | eUnit : term | eVar : id -> term | eAbs : id -> type -> term -> term | eApp : term -> term -> term with type : Set := | tUnit : type | tArrow : type -> type -> type.
To keep this article short, I won’t describe how the language works. There are many good introductions. My favorite is Chapter 9 of Benjamin C. Pierce’s wonderful book called Types and Programming Languages.
Below are the typing rules. You can add whatever base types you like (e.g., Booleans, integers, etc.) to the simply-typed lambda calculus, but to keep things simple I only have a rather useless “unit” type which has only a single value (akin to an empty tuple).
Inductive hasType : context -> term -> type -> Prop := | htUnit : forall c, hasType c eUnit tUnit | htVar : forall x t c, lookupVar c x = Some t -> hasType c (eVar x) t | htAbs : forall e x t1 t2 c, hasType (cExtend c x t1) e t2 -> hasType c (eAbs x t1 e) (tArrow t1 t2) | htApp : forall e1 e2 t1 t2 c, hasType c e1 t1 -> hasType c e2 (tArrow t1 t2) -> hasType c (eApp e2 e1) t2.
To specify the semantics, I defined a small-step operational semantics. The main idea is that you define a relation which says what steps a program takes when it is run. There are many ways to define the semantics of a programming language, and this is just one of them.
Inductive step : term -> term -> Prop := | sBeta : forall e1 e2 x t, value e2 -> step (eApp (eAbs x t e1) e2) (sub e1 x e2) | sApp1 : forall e1 e2 e3, step e1 e2 -> step (eApp e1 e3) (eApp e2 e3) | sApp2 : forall e1 e2 e3, value e1 -> step e2 e3 -> step (eApp e1 e2) (eApp e1 e3).
How do we know when a program has finished running? We just have to define what kinds of programs are “fully evaluated.”
Inductive value : term -> Prop := | vUnit : value eUnit | vAbs : forall e x t, value (eAbs x t e).
This states that a value (i.e., an expression which has been fully evaluated) is either unit constant or a function. So function applications are not values, since they can be reduced.
An expression is in normal form if there are no more steps it can take:
Definition normalForm e1 := ~ exists e2, step e1 e2.
A stuck program is one which is in normal form but is not a value:
Definition stuck e := normalForm e /\ ~ value e.
Soundness means a program that is well-typed will never get stuck (
stepStaris the transitive reflexive closure of the
steprelation defined above):
Theorem soundness : forall e1 e2 t, hasType cEmpty e1 t -> stepStar e1 e2 -> ~ stuck e2.
The proof, including definitions and lemmas, is about 500 lines long. For such a simple language, this was quite a difficult undertaking! I recall working on this for an entire weekend, probably 8 hours both days. Now I really appreciate when research papers include a formally verified soundness proof.
I think of Coq as an extension of my own ability to reason logically. When I’m in doubt about something, I open the Coq IDE and try to prove it. I think the reason this is so valuable to me is that I often mull over functional programming, types, logic, algorithms, etc. These kinds of things are well-suited to formalization.
Unfortunately, machine-checked theorem proving is not very easy to learn, which is probably why it’s so esoteric. It’s difficult to prove nontrivial results without a graduate-level understanding of type theory (even then it’s still difficult). The most accessible introduction is Benjamin C. Pierce’s Software Foundations, which is freely available online. After that, read Adam Chlipala’s Certified Programming with Dependent Types, which is also freely available online.
I am not an expert in Coq by any means, and my proofs are probably longer and clunkier than they need to be. I only know basic techniques, but a nice thing about logic is that the underlying rules are very simple. I have reached a point where I can prove anything in Coq that I can rigorously prove on paper (given enough time). The two proofs I described in this article took me a few days each, and I’m steadily getting better.
- Strictly speaking, the language itself is called Gallina, and “Coq” refers to the system as a whole.
- There is also an Emacs-based interface, but I’m a Vim user. A friend helped me set it up with Spacemacs, but I still found navigating proofs in CoqIDE to be more a little more intuitive.
- I should know better than to say it’s impossible. Coq allows you to add your own axioms, which is dangerous because your axioms might be inconsistent.
- One could argue that the SK combinator calculus is smaller and still fits this description.