Scrap the database?

Serious progress in the Exquisite Haiku performance melodrama. After talking with Wayne and Eric on Monday – and after flirting with the idea yesterday of completely scrapping Mongo and rebuilding the whole thing on a different store – I think I stumbled across a viable solution to the problem. After lunch in the fellow’s lounge yesterday, Wayne and I got talking about large-query performance in NoSQL databases, and which of the many options (Mongo/Redis/Riak/Couch/Casandra) would be best for low-write high-read storage, and, more narrowly, the specific task of popping out a large collection of records (in the area of 10,000) really, really fast.

Wayne made the point that most of the NoSQL stores try to excel at a very specific task or suite of tasks – with NoSQL, there’s a move away from the monolithic, one-size-fits-all approach taken by established SQL stores like MySQL and Postgres. Redis tries for really fast read access; Couch makes it possible to define “views” on data that define in advance the structure it needs to have on retrieval; Riak focuses on extreme distribution.

Really, though, all of this is beside the point. Exquisite Haiku will almost certainly never be run as a public-facing “service” that would be in the position of accumulating hundreds of terabytes of data scattered across dozens of servers. There’s too much computation, too often, for that kind of deployment to be feasible – and beyond the question of whether or not it would be possible, it’s not really a priority for the project. Instead, the challenge here is just the matter of moving the core vote data over the wire from the database to the application code fast enough for the scoring routine to keep up with the slicer, which, ideally, would max out at about 1000ms. The problem is just the access issue – how to get the current vote data booted into memory and handed over to v8 for the scoring?

Wayne made a key insight – the vote data is completely static in the sense that it is never updated. The set of point allocations that need to be computed for a given slice in a given round is constantly expanding in directionality as more and more allocations are posted back from the clients, but the content of any given vote stays the same for the entire duration of the round once it is applied. There’s a big bucket of data that needs to be reliably condoned off from other existing buckets of data in the application, but once you toss something into the bucket you never need to get it back out individually and work with it – you just need to walk the contents of the entire bucket. The application needs to work with the set of vote data for a round as a clump, but never with subsets.

So, if the hurdle is getting the constantly-expanding-but-never-changing lump of data into memory…why have it ever leave memory in the first place? If it all needs to be available in real-time in memory for the slicer, why go to the trouble of sending it off to the database, then only to have to pull it all back over the wire every few hundred milliseconds when the scoring routine executes?

The solution, I realized, is just to push the votes onto a big object, keyed by round id, sitting on the node global object, and work directly on the in-memory data. With this approach, all that the scoring routine just has to run a single findById query to get out the poem record (this is negligible from a time budget standpoint, a handful of milliseconds at most), and then get the current round id off of the retrieved document to key the votes object on the global namespace. In the end, I can get the same per-slice performance that I was seeing when I was trying to load all of the data into a big all-encompassing poem super-document, but without the atomicity chaos that results from trying to make lots of concurrent writes on a physically large document on the disk.

Of course, if the power shuts off and the server dies in the middle of the round, all of the un-persisted vote data sitting in memory would be lost (the poem document itself, though, and all of the words locked up to that point, would be unaffected). But Oversoul is about as far from any kind of business- or life-critical system as could ever be imagined, and the theoretical risk of data loss is a small price to pay in order to achieve the kind of scale necessary to make the final language persuasively unattributable.