March 4, 2016
Have you ever needed to memorize a phone number? IP address? Hex color code? I made just the thing for you!
Huffgram will convert any piece of text into a word phrase that is easier to memorize. You might use it to memorize a phone number, a password, or even some digits of π. When you need the original text back, Huffgram can recover it from the word phrase.
How it works
Using a corpus composed of a handful of modern classic novels (e.g., To Kill a Mockingbird), I constructed a bunch of Huffman trees. Each unique word in the corpus gets its own Huffman tree, which is generated from the probability distribution over subsequent words. Thus, the Huffman trees approximate a bigram model. All of those Huffman trees (~900kb of minified JSON) are loaded by the browser. You can download the file here.
Essentially, Huffgram interprets the input as compressed data and decompresses it using those Huffman trees. For each word \(w\), Huffgram “decompresses” the next word by traversing the Huffman tree for \(w\) according to the next few bits in the input. To choose the initial word, there is a Huffman tree generated by the probability distribution over words that start sentences in the corpus.
You might wonder what happens if, at the end of the input, there aren’t enough bits to reach the bottom of the last Huffman tree. We could try finishing the path in some arbitrary way (e.g., always go left) to get to a word, but then the mapping wouldn’t be invertible. So we might have some bits at the end which don’t complete a word. To resolve that, we have a special list of special end words, and we treat those final bits as an index into that list.