We all know that finding bugs as soon as possible saves time and money. Static analysis is one of the best ways of finding bugs early, yet in my experience very few development teams have built it into their process. This post takes a look at five of the most common objections to using static analysis, and gives some suggestions for overcoming them.
1. “Static analysis is just for finding style mistakes, not real bugs.”
There are two problems with this objection. First, it’s no longer true. One of our customers identified 50 bugs the first time they used static analysis. The cost of those bugs reaching even QA would have exceeded the cost of bringing in static analysis. If any of those bugs had reached production, the costs would have been even higher. At another customer site, we started the day by presenting a brief summary of static analysis results to the development team. Before we left that afternoon, over 20 bugs had been fixed in the code.
There is an ongoing debate in the academic community on the effectiveness of static analysis for identifying bugs. My view is that static analysis is fundamentally limited in its ability to detect bugs. Static analysis can go some way towards identifying whether or not your code reliably implements the algorithm you’ve coded. It has no way of telling whether that algorithm is appropriate for solving the problem at hand. However, just because static analysis can only find some bugs, even if it’s only one in 10, why on earth would you not use it, so that you can concentrate on finding the hard bugs?
One of my favorite analogies here is with medical practice. There is evidence that maybe thousands of lives could be saved every year if hospital staff simply washed their hands more often. Washing your hands won’t cure cancer or AIDS. People will still get sick. But if you have a simple measure that can solve part of a problem it’s stupid to not use it.
Having said all that, just suppose that static analysis couldn’t find bugs, and could only ensure that your code was consistent with your style guidelines. How valuable would that be? Think about why you have style guidelines in the first place. There may be some practices in there that are designed to prevent bugs, but the primary purpose is to make sure that all of the code in the project looks the same. That way anyone on your team can look at any code in your project and at least have some signposts to help them find their way around. In an often-quoted article, Peter Hallam claims that professional developers spend 75-80% of their time understanding existing code. Anything you can do to make that process more efficient is likely to offer significant payback.
If you have only ever worked with responsible developers who instinctively use coding standards then this argument probably sounds rather forced. As someone who has worked with a developer who simply couldn’t get the hang of indenting his code consistently, I can’t begin to explain how hard it is to understand code when all your subconscious cues for detecting control flow have been destroyed. I remember another developer I worked with whose code was extremely robust but who didn’t use consistent coding standards. It always seemed to me that his code worked better than it ought to, and I found it very hard to maintain.
2. “It’s too noisy.”
A common reaction when talking to developers about static analysis tools is a rolling of the eyes and the recounting of the time they tried to use Lint on their project. It ran for hours and produced tens of thousands of warnings. No one had any idea what to do with the reams of data from the tool, so it went into a folder to be looked at some day and that was the end of the experiment.
The solution to this is to choose and use a tool that only generates data you are going to use. First, different static analysis tools have different philosophies on how to deal with false positives. A false positive is a situation where the tool reports a problem with the code, even though the code is actually correct. Typically, the more ambitious the rule, the greater is the likelihood of false positives. For example, highlighting class names that don’t meet some predefined template is straightforward and would never produce false positives. Identifying possible race conditions in a multi-threaded program is more valuable, but also much more likely to produce false positives.
In Enerjy Code Analyzer, we took the view that a static analysis tool is only useful if it is used constantly, so we excluded any rules that had significant likelihood of false positives. If we were writing a specialist analysis tool that perhaps would be used as part of an extensive QA process, then a different trade-off may have been appropriate. The key point is that you must match the false-positive rate of your tool to the time you have available to deal with the output.
Second, having selected a tool, you should configure it to produce only enough output to be useful to you. I describe how to do this in #5 below but, in brief, unless you’re going to take immediate action to deal with issues flagged by a particular analysis rule, turn the rule off. It’s very tempting to keep all the rules on as a todo list, but once that list becomes more than a few tens of items long, it’s too long to be useful.
3. “It’s inaccurate.”
This objection stems primarily from experience of using static analysis in the C++ world. C++ is a terrible language for tools vendors to handle: there are only a handful of people in the world capable of writing an accurate parser to read and understand C++ source files in all their template-ridden complexity. On top of this, you have to deal with the preprocessor transforming the visible source code into something quite different before it hits the compiler. Unless you get the exact same set of preprocessor defines to your static analysis tool that your compiler is using, you can expect to be overwhelmed with spurious or plain incorrect warnings.
Thankfully, the designers of Java long ago decided to do away with the preprocessor, so static analysis in Java tends to be much more robust. As long as you get the source code compatibility level and class path correct, any tool will analyze the code correctly.
The best solution to this problem, though, is to use a static analysis tool that can read from the same build scripts (or project files, if you’re using an IDE) that you use to compile your code. Most Java tools support Eclipse and Ant integration. In the C++ world, at least on Windows, the Visual Lint and LintProject tools from Riverblade allow you to run the excellent Gimpel Lint tool using the exact same settings that Visual Studio is using.
4. “It takes too long.”
I am still baffled at the number of static analysis tools that need to be explicitly executed by the developer. Eclipse made great strides with its smart incremental compiler guaranteeing that your Java code is constantly fully compiled and up to date, yet the static analysis framework within TPTP needs to be invoked from a menu. Why would you want to immediately check your code for compilation errors, but only check for bugs or security problems when you happen to have a few minutes and remember to run static analysis?
A design goal for the Enerjy Code Analyzer from day one was that it had to run whenever you saved a file and display the results of the analysis along with the compiler output. The Eclipse Checkstyle plugin also runs just like the compiler, analyzing every file as soon at it is saved and showing the results in the problems view. No commands to invoke. No opening a new view to look at the results. Some static analysis tools even require switching to a different perspective in Eclipse to view the results!
Developers are busy and the only way to make static analysis work is to integrate it completely into their existing workflow. If you treat a static analysis violation just the same way you would treat a compiler warning there is no overhead to using static analysis. And if your existing tool won’t let you work that way, find one that will.
5. “I’m in the middle of a project right now - I’ll use it for my next project.”
This is related to objection #2. Sure, I’ll write clean code next time, but for now I need to deal with an existing code base that wasn’t written with static analysis in place and contains too many violations to manage.
The solution is the same. The only way to use static analysis is with a zero-tolerance policy. Your entire project must build with no static analysis warnings at all times. That means that you must use a tool that supports escapements i.e., the ability to suppress messages from that static analysis tool in situations where you know the code is correct. This is usually achieved by inserting magic comments or, in Java, annotations.
Some static analysis is more valuable than no static analysis. Most tools have a severity setting, so try enabling only the most serious violations and then running the tool against your code. The chances are it will find few, if any, violations. The most serious violations typically correspond to bugs, and your existing unit and functional tests have probably flushed out those bugs already.
If there are still too many violations, then there’s probably a mismatch between your coding style and the default settings for the tool. Next project you can review your coding standards, but for now simply disable any rules that are too noisy to deal with. Remember, the goal is to get static analysis up and running on the project that you have today.
Once you get the number of violations down to a manageable number, take the time to check out any messages and resolve them. Typically this may show up a few bugs but in most cases the solution will be to mark up the source code to suppress incorrect warnings. That’s not a bad thing in itself; if the tool is confused by the code then the next maintenance programmer probably will be too, and they’ll thank you for the explanatory comment.
You now have a static analysis safety net for your project. You may even want to round out the rule set over time; it’s usually easy to deal with the number of warnings introduced by enabling one new rule at a time. Even if you don’t, I guarantee that you’ll be as surprised and happy as I am whenever my static analysis tool reminds me that I’m about to run code containing an infinite loop. Or that I never use the value I’ve just written hundreds of lines of code to compute…