I have always been suspicious of using the McCabe Cyclomatic Complexity metric (which counts the number of different executable paths through a module) as a measure of quality. Common sense dictates that the more paths through a program, the more complicated it is; but does that really mean parts of my program have bugs if they have a high McCabe value?
The metric was created in 1976, and there is little published evidence over the last 30 years from real world projects to indicate how useful it actually is as a measure of quality. Many people are somewhat familiar with the thresholds in the SEI table, although there has been precious little research that attempts to correlate a McCabe value in excess of, say 10 with an increased probability of “bugginess.”
Over the last year, we have performed a historical analysis of tens of thousands of source code files, applying individual metrics to them, of which McCabe was one. For each file, we analyzed the metrics, along with the defect rates for that file, and did the correlation.
The graph below shows the correlation of Cyclomatic Complexity (CC) values at the file level (x-axis) against the probability of faults being found in those files (y-axis).
The results show that the files having a CC value of 11 had the lowest probability of being fault-prone (28%). Files with a CC value of 38 had a probability of 50% of being fault-prone. Files containing CC values of 74 and up were determined to have a 98% plus probability of being fault-prone.
From this analysis, my suspicions about this metric have been quashed and, if we know nothing else about a file except that it has a high Cyclomatic Complexity value, we can now have more reason to believe that it is likely to cause problems (keeping in mind of course that there are no guarantees about anything in this life).
Our technical paper provides more information about how the data was collected and how the model to determine fault-prone files was applied.