After implementing the new everything-in-memory approach to the scoring in Exquisite Haiku, I enlisted the help of Eric and Wayne on Friday morning and spent some time combing though the two methods that do the heavy lifting –
poem.score() – and tried out a progression of small changes on the code that yielded really dramatic speed increases. These kinds of focused “algorithms” optimizations can often be more academic than practical – more often than not, the pain points that really choke up performance tend to crop up at the integration level, the plumbing code that wires up the various chunks of the system and makes the big, architectural choices about what gets stored where, how, and when. For example, the big shift away from from writing the vote data into Mongo and leaving it in RAM was this class of optimization, and, indeed, it resulted in a far greater one-off performance gain than the sum effect of everything that follows here.
The goal with Exquisite Haiku is to build it so that 1000 people can participate in a poem running on a 1000ms scoring interval with 5-minute word rounds. What does that require from a performance standpoint at the level of the scoring routine? It depends on the frequency with which people make point allocations in the voting phase, which is an unknown at this point. The voting rate could probably be regulated to some extent by cutting down the starting point allotments, which would probably make people a bit more trigger-shy on the allocations. But that’s a cop-out. My guess is that people would average, at most, about one vote per ten seconds over the course of the five minute word round. So, 6 (votes per minute) x 5 (minutes in the word round) x 1000 (players) would add up to 30,000 active votes at the end of the round.
Assuming a 1000ms scoring interval, my (really, quite uninformed at this point) guess is that the scoring routine would need to max out at about 500ms in order for the application to stay on its feet and not fall behind it’s own rhythm. In the worst case scenario, if the scoring slice takes longer than the scoring interval to complete, you’d would have the situation where the application would initiate a new scoring computation while the previous computation is still running, which would choke things up even more and likely send the whole affair in a sort of viscous circle of over-scheduled scoring. Honestly, I’m really curious about exactly how this will play out, and how resilient the whole system will be to severe overload.
So, the algorithmic efficiency of the scoring process matters here, a lot. I wrote a little benchmarking routine for the vote.score() method that just runs the function on a single vote X number of times. This is a simplistic approach to benchmarking that’s perhaps a bit dubious when used as a point of comparison across different runtimes, but all that matters here is the relative performance of different passes on the code in any given runtime.
And here’s the original pass on
10,000,000 executions runs in 84,913ms. The poem benchmark, run with 100 discrete words each with 300 active votes (the 30,000-vote requirement needed for the 1000-person-5-minute-rounds-at-1000ms-slicing-interval threshold), executes in 540ms, a bit over where I want it. With Eric and Wayne’s help, I made these changes:
- Instead of dividing by 1000 in the rank scaling computation, multiply by 0.001;
this.quantity * -decayis computed twice, once in the
bound1computation and again as a component of the
bound2computation. Reuse the
bound1value in the
Math.pow(Math.E, (-delta / decay))is computed twice, once in the churn computation and again to get bound2. Break this out as an “unscaled” value of the decay function, and use the value in both computations.
- The division at
-delta / decaywould be faster as a multiplication with that uses the inverse of decay, computed outside of the vote scoring method. I changed the method to take
decayL(the mean decay lifetime) and
decayI(the inverse of the decay lifetime), and change the calling code in
1 / decayLifetimeand pass the value into
The method now looks like this:
This gets the 10,000,000 benchmark on vote.score() down to 58,801ms, almost exactly a 30% increase. The poem benchmark at 100-words-300-votes-per-word drops to 431ms, a 20% increase, and under the (sort of arbitrary) 500ms barrier.