If you’ve been reading here for a while, you will know by now that we have been doing a lot of number-crunching here at Enerjy lately. We’re on a quest to find a correlation between metrics and defect rates, and to answer the question: “what metrics are good predictors of bugginess?”
There is a different, but in a way, very similar quest going on at Netflix, where, for the past 13 months or so, they have been running a competition to try and improve their recommendation engine. That’s the software that figures out that, if I gave Bambi five stars, I would probably also enjoy watching Reservoir Dogs. Clearly, improving the recommendation engine is important to Netflix: back in October 2006 they offered up $1m as a prize to anyone who could improve their current algorithm by 10%. Despite the best efforts of the nearly 24,000 teams working on the problem, no-one has achieved the 10% improvement yet, and as of now (December 2007), the best percentage improvement stands at 8.5%.
But the job of a recommendation engine is very similar to what we are doing here. We’re analyzing tens of thousands of source code files, collecting data on a couple of hundred metrics on each one, and then statistically correlating those to the number of defects that were found in each one of those files. The good news is that we’ve found some strong correlations between certain combinations of metrics and defect rates, that allows us to make a pretty accurate prediction of whether the code we are looking at is going to be bug-prone or not. We’ll be launching a product based on that analysis in the new year. Once we’re done with that, maybe we’ll take a crack at the Netflix Prize…
By the way, if you’re interested in reading more about number crunching, and some of its applications in everyday life, I can highly recommend Super Crunchers by Ian Ayres. Fascinating stuff.