McCabe Cyclomatic Complexity: the proof in the pudding
February 13th, 2008 by Rich Sharpe. Posted in Software Quality, Software Quality MetricsI have always been suspicious of using the McCabe Cyclomatic Complexity metric (which counts the number of different executable paths through a module) as a measure of quality. Common sense dictates that the more paths through a program, the more complicated it is; but does that really mean parts of my program have bugs if they have a high McCabe value?
The metric was created in 1976, and there is little published evidence over the last 30 years from real world projects to indicate how useful it actually is as a measure of quality. Many people are somewhat familiar with the thresholds in the SEI table, although there has been precious little research that attempts to correlate a McCabe value in excess of, say 10 with an increased probability of “bugginess.”
Over the last year, we have performed a historical analysis of tens of thousands of source code files, applying individual metrics to them, of which McCabe was one. For each file, we analyzed the metrics, along with the defect rates for that file, and did the correlation.
The graph below shows the correlation of Cyclomatic Complexity (CC) values at the file level (x-axis) against the probability of faults being found in those files (y-axis).
The results show that the files having a CC value of 11 had the lowest probability of being fault-prone (28%). Files with a CC value of 38 had a probability of 50% of being fault-prone. Files containing CC values of 74 and up were determined to have a 98% plus probability of being fault-prone.
From this analysis, my suspicions about this metric have been quashed and, if we know nothing else about a file except that it has a high Cyclomatic Complexity value, we can now have more reason to believe that it is likely to cause problems (keeping in mind of course that there are no guarantees about anything in this life).
Our technical paper provides more information about how the data was collected and how the model to determine fault-prone files was applied.
5 responses to “McCabe Cyclomatic Complexity: the proof in the pudding”
Leave a reply
You must be logged in to post a comment.

February 16th, 2008 at 9:24 pm
[…] Kieth Braithwaite has made an interesting observation here. The basic idea is that code that has been written with TDD has a lower Cyclomatic Complexity per function compared to code that has not been written with TDD. If this is true then it could imply lower defects because of this. […]
February 18th, 2008 at 11:30 pm
One could argue that I can make a complex program into many many simple modules with no more than two paths to follow, thereby reducing the chances of fault to a minimum. But what isn’t then accounted for is the added complexity of integration, which indeed isn’t part of the McCabe metric.
My view of the McCabe numbers is that they should be used in a relative sense; the relative MCCabe complexity of one solution vs. similar types of solutions. That would eliminate substituting higher integration complexity for lower module complexity.
February 19th, 2008 at 12:57 pm
@Stat
Agreed that if a complex program could be refactored into many smaller modules the faults may be reduced as the code itself could be easier to read, understand and even by doing this you could possibly find a bug or two.
The results we posted demonstrate that if you are maintaining some legacy code that has a high McCabe value and the file is part of the core business functionality then this may be a place to start. Note that this graph is based on knowing nothing else about the file (including size).
Your suggestion of integration complexity is interesting as if something like dependency analysis was a factor, I’m sure the probability of fault-proneness would result in a different graph.
However I’m not sure how different the trend would be grouping types of solution as the McCabe metric was designed to be independent of solution type.
April 15th, 2008 at 5:26 pm
It is unclear to me after reading the paper and the blog what “Fault Prone” means. Is the graph showing the probability that the file will be the most troublesome file out of all the source files in a project? Can you clarify, Rich? Thanks.
May 13th, 2008 at 2:22 pm
[…] have anywhere to put it. For example, if you follow this blog you’ll have seen Rich’s post on our analysis of Cyclomatic Complexity. Well, just for a start I’ve got an updated version […]