October 29, 2014
Reddit user TheBananaKing proposes a compelling alternative to ads for generating revenue on popular websites:
I’m curious—let’s do some Fermi calculation.
Reddit gets 5 million pageviews per hour. Let’s assume each pageview lasts 2 minutes on average. So at any point in time, there are 167k people on reddit.
That’s a huge limitation. You will have to break down the computation into tiny chunks, and that’ll incur significant overhead. The problem had better be embarrassingly parallel.
A project called [email protected] uses a similar distributed army of compute nodes to run protein folding simulations. According to their website, this helps scientists find cures for Alzheimer’s, Huntington’s, Parkinson’s, and many cancers. Noble, but it doesn’t generate revenue, so it won’t replace ads. An earlier project, [email protected], allows people to volunteer their CPU time to help search for extra-terrestrial intelligence. But, like [email protected], it’s not a business model.
What other kinds of problems could we solve on this platform? Anecdotally, when people buy “compute power” on platforms like Amazon EC2, their usage tends to fall into one of three categories:
- Web server
- Asynchronous background jobs
We could try to use this collective compute power for CPU-intensive things like resizing images, transcoding video, etc. A video service like YouTube could buy reddit compute power to transcode videos as they are uploaded by users. Perhaps each client gets a few seconds of video to transcode, and they are stitched together by YouTube servers.
It would almost certainly be more efficient for YouTube to just transcode the videos on the same servers that would otherwise be doing this “stitching”. Moreover, it wouldn’t work for private/unlisted videos—because clients can access the video data.
There is a newly active area of research called homomorphic encryption, which allows a client to perform computations on encrypted data without decrypting it. With fully homomorphic encryption, a client could in principle transcode private video data without compromising its owner’s privacy. This Microsoft paper describes a state-of-the-art homomorphic encryption scheme—it’s impressive, but only performant for simple computations like doing linear combinations of vectors. Video transcoding is a big leap from that—especially considering we don’t have an efficient way to do division homomorphically. For now, we have to restrict our computation to work with data that is public.
A related problem is that we don’t even know if the browser faithfully executes the computation! It’s trivial for a malicious agent to mutate the data, modify the computation, or otherwise sabotage the results. Another active area of research, called verifiable computing, attempts to solve this problem—but, like homomorphic encryption, it’s too nascent to be practical for now. Though, it’s a two-for-one deal: it can be shown that if we can make homomorphic encryption practical, we get verifiable computing for free. Until then, our distributed computation also needs to be independently verifiable—without re-doing the whole computation. NP-hard problems are good candidates. Alternatively, we could verify computations by duplicating work across many clients and checking that the results match, but this reduces the effective size of our botnet and increases the surface area for attack.
An interesting development in browser technology is called WebRTC. It enables browsers do peer-to-peer networking. With our 100k active visitors, we could use it to run a pretty substantial mesh network. For example, we might be able to run slimmed-down bitcoin nodes. While interesting, operating a bitcoin node is not a lucrative business.
But what about bitcoin mining? It meets our two technical criteria: it’s embarrassingly parallel and efficiently verifiable. The current bitcoin mining difficulty is 35,985,640,265. That means it takes, on average, 154 quintillion hashes to mine a block. The current block reward is 25 coins, and the current market price for bitcoins is about $355. So, 154 quintillion hashes is worth $8,875. The fastest software miner can compute 75 million hashes per second. With our 100k slave machines, thats 7.5 trillion hashes per second. It would take about 20 million seconds, or 231 days, to mine a block. That’s $38 per day. With ads, a site like reddit could make that much in about a second.
So far, our browser botnet isn’t looking very useful. I can think of another set of problems which are both parallelizable and verifiable: compilation, optimization, and code generation. Each client could be sent a single source file to compile, and a fairly well-understood idea called proof-carrying code could make this possible. The idea is that the compiler emits the compiled code together with a proof that the compiled code correctly refines the source. So, we don’t have to trust the clients! If we’re willing to let clients see the source code, we could build a project with thousands of source files in just a few seconds with our 100k-node compiler army.
So, a large site like reddit could compile open-source projects quickly—but we have continuous integration systems for that.