Scoring routine optimizations

After implementing the new everything-in-memory approach to the scoring in Exquisite Haiku, I enlisted the help of Eric and Wayne on Friday morning and spent some time combing though the two methods that do the heavy lifting – vote.score() and poem.score() – and tried out a progression of small changes on the code that yielded really dramatic speed increases. These kinds of focused “algorithms” optimizations can often be more academic than practical – more often than not, the pain points that really choke up performance tend to crop up at the integration level, the plumbing code that wires up the various chunks of the system and makes the big, architectural choices about what gets stored where, how, and when. For example, the big shift away from from writing the vote data into Mongo and leaving it in RAM was this class of optimization, and, indeed, it resulted in a far greater one-off performance gain than the sum effect of everything that follows here.

High-level interpreted languages (Javascript, Python, Ruby) generally supply carefully-chosen, native implementations of most speed-critical, algorithmically non-trivial process that you need in workaday application development (sorting comes to mind), and it’s usually unlikely that a custom implementation would be faster. Exquisite Haiku is an interesting exception in this regard, though, because the scoring routine is both (a) pretty irregular from a computational standpoint and (b) incredibly central in the application – if there are 10,000 active votes in a give word round and the poem is running on a 500ms scoring interval, the vote.score() method is run 10,000 times every 500ms, or 1,200,000 times a minute. Changes that shave a couple thousandths of a second off the runtime of the method can add up to savings of dozens or hundreds of milliseconds in production, depending on the volume of data that’s being worked with.

The goal with Exquisite Haiku is to build it so that 1000 people can participate in a poem running on a 1000ms scoring interval with 5-minute word rounds. What does that require from a performance standpoint at the level of the scoring routine? It depends on the frequency with which people make point allocations in the voting phase, which is an unknown at this point. The voting rate could probably be regulated to some extent by cutting down the starting point allotments, which would probably make people a bit more trigger-shy on the allocations. But that’s a cop-out. My guess is that people would average, at most, about one vote per ten seconds over the course of the five minute word round. So, 6 (votes per minute) x 5 (minutes in the word round) x 1000 (players) would add up to 30,000 active votes at the end of the round.

Assuming a 1000ms scoring interval, my (really, quite uninformed at this point) guess is that the scoring routine would need to max out at about 500ms in order for the application to stay on its feet and not fall behind it’s own rhythm. In the worst case scenario, if the scoring slice takes longer than the scoring interval to complete, you’d would have the situation where the application would initiate a new scoring computation while the previous computation is still running, which would choke things up even more and likely send the whole affair in a sort of viscous circle of over-scheduled scoring. Honestly, I’m really curious about exactly how this will play out, and how resilient the whole system will be to severe overload.

So, the algorithmic efficiency of the scoring process matters here, a lot. I wrote a little benchmarking routine for the vote.score() method that just runs the function on a single vote X number of times. This is a simplistic approach to benchmarking that’s perhaps a bit dubious when used as a point of comparison across different runtimes, but all that matters here is the relative performance of different passes on the code in any given runtime.

The benchmark:

And here’s the original pass on vote.score():

10,000,000 executions runs in 84,913ms. The poem benchmark, run with 100 discrete words each with 300 active votes (the 30,000-vote requirement needed for the 1000-person-5-minute-rounds-at-1000ms-slicing-interval threshold), executes in 540ms, a bit over where I want it. With Eric and Wayne’s help, I made these changes:

  1. Instead of dividing by 1000 in the rank scaling computation, multiply by 0.001;
  2. this.quantity * -decay is computed twice, once in the bound1 computation and again as a component of the bound2 computation. Reuse the bound1 value in the bound2 computation.
  3. Math.pow(Math.E, (-delta / decay)) is computed twice, once in the churn computation and again to get bound2. Break this out as an “unscaled” value of the decay function, and use the value in both computations.
  4. The division at -delta / decay would be faster as a multiplication with that uses the inverse of decay, computed outside of the vote scoring method. I changed the method to take decayL (the mean decay lifetime) and decayI (the inverse of the decay lifetime), and change the calling code in poem.score() to compute 1 / decayLifetime and pass the value into vote.score().

The method now looks like this:

This gets the 10,000,000 benchmark on vote.score() down to 58,801ms, almost exactly a 30% increase. The poem benchmark at 100-words-300-votes-per-word drops to 431ms, a 20% increase, and under the (sort of arbitrary) 500ms barrier.