In his last post, Rich talked about using static analysis to detect incorrectly coded JUnit tests that failed to report assertion failures because they didn’t occur on the main thread. For example, the test
public void test() {
Thread t = new Thread() {
@Override
public void run() {
assertTrue(false);
}
};
t.start();
}
should fail, because the assertion will always fail. However, if you type this code into a test case and run it, the test passes. This is because failed assertions are designed to throw an exception that the JUnit framework catches and reports. However, the JUnit framework is running on the main thread and so the exception handler does not see the exception that is thrown from within the newly created thread.
Rich’s post describes a fix for the problem - catching the Exception on the worker thread and saving it in an instance variable that can be picked up by the tearDown code back on the main thread. However as with any bug, it’s worth spending some time thinking about it could automatically be detected. After all, any bug that you can find using static analysis at least saves you a run through your unit tests and at best stops you from shipping buggy code, so catching as much as you can using static analysis saves time and money.
As authors of static analysis tools, this caused quite a discussion here in the Enerjy office that goes to the heart of why static analysis is harder than it looks.
There are two ways for static analysis to fail: false positives and false negatives. A false positive error happens when the analysis tool reports a problem that doesn’t really exist. Most tools will allow you to mark the code in some way so that the tool won’t continue to report the problem, but any tool that fires too many false positives will pretty soon have users reaching for the uninstaller.
On the other hand false negatives occur when the tool fails to detect a real error. These are also dangerous, as the developer will come to rely on a clean bill of health from the tool as an indication that the code is error-free.
These two types of error are usually at opposite ends of the spectrum: if you make a rule more sensitive, then you reduce the likelihood of false negatives at the expense of increasing the likelihood of false positives. If you decrease the sensitivity of the rule, the behavior is reversed. So, writing a static analysis rule is usually a tradeoff between the two types of error.
Our approach here at Enerjy is to avoid false positives, even at the expense of missing some genuine code errors. Our experience has been that false positives quickly lead to a tool falling into disuse, and a tool that finds 80% of problems but is used is a lot more valuable than a tool that finds 85% of problems but sits on a shelf gathering dust. There are exceptions of course: for critical software it’s probably worth using the more sensitive analyzer, since the cost of letting one bug through is higher than the cost of manually reviewing all of the errors reported by the tool and separating the genuine errors from the false positives.
What makes Rich’s bug so interesting is that it’s hard to detect using static analysis, because any rule would have a reasonable chance of both false positives and false negatives. FindBugs is one of the few analyzers brave enough to take on this situation, so I’ll use that as an example.
First, note that FindBugs does correctly detect the problem in the code above. But now let’s change the code slightly:
public void test2() {
Thread t = new Thread() {
@Override
public void run() {
validateContent();
}
};
t.start();
}
private void validateContent() {
assertTrue(false);
}
All I’ve changed is to move the assertion into a helper method - a very common idiom when you need to reuse the same set of assertions in different tests. FindBugs fails to detect this problem - a false negative. Why doesn’t FindBugs detect this case?
In order to find the problem, FindBugs would have to perform what’s called inter-procedural flow analysis i.e., it has to know how program execution proceeds from one method to the next. That’s the only way it could detect that the validateContent helper method is called from the run method of a thread that’s started from a JUnit test case. It is possible to perform this kind of analysis, and some of the higher-end analyzers do, but it’s very hard, and, more importantly, it’s very slow. It would be next to impossible, for example, to include this kind of check in an analyzer that automatically checked code as you type.
Now let’s make another change to the method:
public void testSynchronous() {
Runnable r = new Runnable() {
public void run() {
assertTrue(false);
}
};
r.run();
}
This time, FindBugs reports that the assertion won’t be noticed by JUnit. But if you run the test, you’ll see that JUnit sees the assertion just fine, and fails the test. This code works because we’re not executing the code in the runnable on a different thread, we’re running it on the main thread. FindBugs has no way of knowing that and so it reports the problem just in case: a false positive.
Now, I’ve met the authors of FindBugs and they’re smart guys. They’re well aware of these issues, and they decided that, on balance, the rule was correct often enough, and involving a serious enough defect, to justify the error rates that it has. But it’s important to understand that this is a design decision on their part. Different tools will have different tradeoffs in the rates of false positives and false negatives that they’re prepared to accept and there can be no ultimate analysis tool that has lower error rates than all others. That’s why, for critical code, your best bet is to run multiple analysis tools. Each tool will include rules that didn’t fit the design parameters of the others.
So, what are you looking for in a static analysis tool? Would you prefer a rule that fired incorrectly 10% of the time over not having that rule at all? Does your answer change if that percentage moves to 25%? 50%? 75%?
1 comment »