Counting Stars (OneRepublic)
December 13, 2014
I recorded a quick piano cover of Counting Stars by OneRepublic. Sorry for the portrait orientation and clumsy mistakes!
Selling CPU-time instead of ad space
October 29, 2014
Reddit user TheBananaKing proposes a compelling alternative to ads for generating revenue on popular websites:
Instead of selling ad space on websites, sell distributed computing time on client PCs viewing the page.Javascript-powered, sure—but on a big site like reddit, I’m guessing the spare processing capacity of all the users would still be fairly immense.
It sounds great in principle: everyone viewing the website runs JavaScript computations in the background, and interested parties bid on this computational power. Could it replace ads as a means to generate revenue for a high-traffic website?
I’m curious—let’s do some Fermi calculation.
Reddit gets 5 million pageviews per hour. Let’s assume each pageview lasts 2 minutes on average. So at any point in time, there are 167k people on reddit.
What can you do with 167k machines running JavaScript? Let’s say the code is highly optimized asm.js. That’s like having 100k machines running native code, with the caveat that each only runs for a couple minutes on average and might die at any moment.
That’s a huge limitation. You will have to break down the computation into tiny chunks, and that’ll incur significant overhead. The problem had better be embarrassingly parallel.
A project called [email protected] uses a similar distributed army of compute nodes to run protein folding simulations. According to their website, this helps scientists find cures for Alzheimer’s, Huntington’s, Parkinson’s, and many cancers. Noble, but it doesn’t generate revenue, so it won’t replace ads. An earlier project, [email protected], allows people to volunteer their CPU time to help search for extra-terrestrial intelligence. But, like [email protected], it’s not a business model.
What other kinds of problems could we solve on this platform? Anecdotally, when people buy “compute power” on platforms like Amazon EC2, their usage tends to fall into one of three categories:
- Web server
- Database
- Asynchronous background jobs
Due to limitations of JavaScript running in the browser, Option 1 is not technically possible. Option 2 is pretty unpromising too, because each client (and its associated storage) can become unavailable at any time—without any guarantee that it will come back online later. At best, it’s an extremely slow and unreliable cache. What about Option 3?
We could try to use this collective compute power for CPU-intensive things like resizing images, transcoding video, etc. A video service like YouTube could buy reddit compute power to transcode videos as they are uploaded by users. Perhaps each client gets a few seconds of video to transcode, and they are stitched together by YouTube servers.
It would almost certainly be more efficient for YouTube to just transcode the videos on the same servers that would otherwise be doing this “stitching”. Moreover, it wouldn’t work for private/unlisted videos—because clients can access the video data.
There is a newly active area of research called homomorphic encryption, which allows a client to perform computations on encrypted data without decrypting it. With fully homomorphic encryption, a client could in principle transcode private video data without compromising its owner’s privacy. This Microsoft paper describes a state-of-the-art homomorphic encryption scheme—it’s impressive, but only performant for simple computations like doing linear combinations of vectors. Video transcoding is a big leap from that—especially considering we don’t have an efficient way to do division homomorphically. For now, we have to restrict our computation to work with data that is public.
A related problem is that we don’t even know if the browser faithfully executes the computation! It’s trivial for a malicious agent to mutate the data, modify the computation, or otherwise sabotage the results. Another active area of research, called verifiable computing, attempts to solve this problem—but, like homomorphic encryption, it’s too nascent to be practical for now. Though, it’s a two-for-one deal: it can be shown that if we can make homomorphic encryption practical, we get verifiable computing for free. Until then, our distributed computation also needs to be independently verifiable—without re-doing the whole computation. NP-hard problems are good candidates. Alternatively, we could verify computations by duplicating work across many clients and checking that the results match, but this reduces the effective size of our botnet and increases the surface area for attack.
An interesting development in browser technology is called WebRTC. It enables browsers do peer-to-peer networking. With our 100k active visitors, we could use it to run a pretty substantial mesh network. For example, we might be able to run slimmed-down bitcoin nodes. While interesting, operating a bitcoin node is not a lucrative business.
But what about bitcoin mining? It meets our two technical criteria: it’s embarrassingly parallel and efficiently verifiable. The current bitcoin mining difficulty is 35,985,640,265. That means it takes, on average, 154 quintillion hashes to mine a block. The current block reward is 25 coins, and the current market price for bitcoins is about $355. So, 154 quintillion hashes is worth $8,875. The fastest software miner can compute 75 million hashes per second. With our 100k slave machines, thats 7.5 trillion hashes per second. It would take about 20 million seconds, or 231 days, to mine a block. That’s $38 per day. With ads, a site like reddit could make that much in about a second.
So far, our browser botnet isn’t looking very useful. I can think of another set of problems which are both parallelizable and verifiable: compilation, optimization, and code generation. Each client could be sent a single source file to compile, and a fairly well-understood idea called proof-carrying code could make this possible. The idea is that the compiler emits the compiled code together with a proof that the compiled code correctly refines the source. So, we don’t have to trust the clients! If we’re willing to let clients see the source code, we could build a project with thousands of source files in just a few seconds with our 100k-node compiler army.
So, a large site like reddit could compile open-source projects quickly—but we have continuous integration systems for that.
Automated theorem proving in python
October 8, 2014
I made an automated theorem prover for first-order logic. For any provable formula, it’s guaranteed to find the proof (eventually). However, as a consequence of the negative answer to Hilbert’s Entscheidungsproblem, there are some unprovable formulae that will cause it to loop forever.
This is one of my favorite personal projects. Check it out on GitHub!
Here is a transcript of an interactive session demonstrating what it can do:
> P or not P 0. ⊢ (P ∨ ¬P) 1. ⊢ P, ¬P 2. P ⊢ P Formula proven: (P ∨ ¬P). > P and not P 0. ⊢ (P ∧ ¬P) 1. ⊢ P Formula unprovable: (P ∧ ¬P). > forall x. P(x) implies (Q(x) implies P(x)) 0. ⊢ (∀x. (P(x) → (Q(x) → P(x)))) 1. ⊢ (P(v1) → (Q(v1) → P(v1))) 2. P(v1) ⊢ (Q(v1) → P(v1)) 3. Q(v1), P(v1) ⊢ P(v1) Formula proven: (∀x. (P(x) → (Q(x) → P(x)))). > exists x. (P(x) implies forall y. P(y)) 0. ⊢ (∃x. (P(x) → (∀y. P(y)))) 1. ⊢ (P(t1) → (∀y. P(y))), (∃x. (P(x) → (∀y. P(y)))) 2. P(t1) ⊢ (∀y. P(y)), (∃x. (P(x) → (∀y. P(y)))) 3. P(t1) ⊢ (∀y. P(y)), (P(t2) → (∀y. P(y))), (∃x. (P(x) → (∀y. P(y)))) 4. P(t1) ⊢ (P(t2) → (∀y. P(y))), (∃x. (P(x) → (∀y. P(y)))), P(v1) 5. P(t1), P(t2) ⊢ (∀y. P(y)), (∃x. (P(x) → (∀y. P(y)))), P(v1) 6. P(t1), P(t2) ⊢ (∀y. P(y)), (P(t3) → (∀y. P(y))), (∃x. (P(x) → (∀y. P(y)))), P(v1) 7. P(t1), P(t2) ⊢ (P(t3) → (∀y. P(y))), P(v2), (∃x. (P(x) → (∀y. P(y)))), P(v1) 8. P(t3), P(t1), P(t2) ⊢ (∀y. P(y)), P(v2), (∃x. (P(x) → (∀y. P(y)))), P(v1) t3 = v1 Formula proven: (∃x. (P(x) → (∀y. P(y)))). > axiom forall x. Equals(x, x) Axiom added: (∀x. Equals(x, x)). > axioms (∀x. Equals(x, x)) > lemma Equals(a, a) 0. (∀x. Equals(x, x)) ⊢ Equals(a, a) 1. Equals(t1, t1), (∀x. Equals(x, x)) ⊢ Equals(a, a) t1 = a Lemma proven: Equals(a, a). > lemmas Equals(a, a) > remove forall x. Equals(x, x) Axiom removed: (∀x. Equals(x, x)). This lemma was proven using that axiom and was also removed: Equals(a, a)
Garnet: a tiny template engine for Node.js
August 28, 2014
This afternoon I built Garnet: a fast and minimalistic template engine for Node. The syntax is like eRuby, so it will be familiar to Rails developers. It supports layouts, unlike EJS. And unlike Jade, it lets you write raw HTML and gives you full control over every character.
Installation
$ npm install garnet
Features
- Compatible with Express
- Performant due to precompilation and caching
- Evaluate JavaScript (e.g., for conditionals and loops):
<% code %>
- Evaluate and embed (with sanitization):
<%= code %>
- Evaluate and embed (without sanitization):
<%- code %>
- Render a template from within a template:
<%- render(path, locals) %>
API
Compiling and rendering
To compile a template (or fetch an already-compiled template from cache):
var template = garnet.compile(path);
To render a template:
var output = template(locals);
To render a template from within another template (and compile it if necessary):
<%- render(path, locals) %>
Default template directory
By default, Garnet looks in
./views
for unqualified template names. If you want to change the default path to ./templates
, for example, use:garnet.templateDir = path.join(process.cwd(), 'templates');
Default template extension
If you refer to a view without a file extension, Garnet assumes
.garnet
by default. You can change this like so:garnet.templateExt = '.html';
Caching
By default, Garnet will only load and compile a template once. If you want Garnet to reload and recompile templates whenever they are rendered, you can do so with:
garnet.enableCaching = false;
This is useful for development (you don’t need to restart the server for every change), but you should leave caching enabled in production.
Examples
Using Garnet with Express
To render a view with Express:
app.get('/', function(req, res) { res.render('index.garnet'); }
If you want to omit the
.garnet
extension from the line above, you can tell Express to assume it:app.set('view engine', 'garnet');
If you want to use a different file extension (e.g.,
.html
) for views, use this:app.set('view engine', 'html'); // Tell Express to assume this extension app.engine('html', garnet.__express); // Tell Express to use Garnet for this extension garnet.templateExt = '.html'; // Tell Garnet to assume this extension
Locals
You can pass data to a view using the
locals
argument.For example, in
app.js
:res.render('user.garnet', { name: 'Stephan Boyer' });
In
views/user.garnet
:Name: <%= locals.name %>
Conditionals
<% if (user) { %> Name: <%= user.name %> <% } %>
Loops
<% users.forEach(function(user) { %> Name: <%= user.name %> <% } %>
Layouts
We simply pass the name of the view to the layout as a local:
<!DOCTYPE html> <html> <head> <title>Layout Demo</title> </head> <body> <%- render(locals.view, locals) %> </body> </html>
In Express, you might render a view with this layout as follows:
app.get('/', function(req, res) { res.render('layout.garnet', { view: 'index.garnet' }); });
Let It Go (Frozen) + Let Her Go (Passenger)
May 29, 2014
This evening I did a quick mashup of two similarly-entitled songs: Let It Go from Frozen and Let Her Go by Passenger. I didn’t have sheet music for either song, so I apologize for any wrong notes—my ears are not very trustworthy!