Nigel posted a couple of times last week about the importance of coding standards. This has been a big deal for me ever since the first day, 15 years ago, that I started working on a project that was large enough to need two developers. The other developer on my team was talented and wrote honest, reliable code. The problem was, it was never clear to me that his code was that good. He just wasn’t an idiomatic programmer and so he might code a loop one way on Monday and the same loop a completely different way on Tuesday. Just simple changes in naming and use of conventions would have increased my productivity significantly as I tried to absorb and understand his code.
Intuitively, it seems clear that code you can read quickly is code that you can understand quickly. Which means faster, more thorough code reviews, and more accurate maintenance. If that’s true then, to cut straight to the point, you’d expect code that doesn’t follow standards to contain more bugs than code that does. This is a hypothesis that we can test, and one of things we’re working on here at Enerjy is finding out whether there is a link between coding standards and bugginess. In preparation for that, I’ve been running static analysis over several major open source projects. We’re not ready to talk about our final results yet, but in the mean time, I’d like to share some interesting findings from the static analysis.
For context, this is an ongoing project that we will continue to blog about. But as of now, we have analyzed about 8,300 files across 11 open source projects.
First the good news. There are two main categories of violation that never showed up throughout the entire life of the projects. The first was naming standards. One of the best things Sun did from day one with Java was to have an opinion on naming style and to religiously stick to it in all of their tutorials, books and articles. Open source contributors appear to have been paying attention; not once did we find a class, interface, or interface method name that deviated from Sun’s original standards. There was a little more variance in method names, with 86 files using non-standard method names. Field names were even more diverse (256 files with non-standard field names) although these were primarily in three projects that had their own, slightly different coding standards.
The second, slightly surprising, category of violation that never showed up, relates to use of esoteric language features like finalization and customized serialization. Not once did anyone declare a finalize method incorrectly, or attempt to explicitly call one. No-one ever caught ThreadDeath without re-throwing it. And the serialization customization methods readObject and writeObject were at least correctly declared. Although this initially took me by surprise, I now think it makes sense. I’ve been coding Java for maybe seven years and I still don’t fully get serialization, which means I have to reach for the JPL book every time I need to use it. I’m guessing I’m not alone in this, which means that these more obscure features tend to be used carefully, and with reference to the docs each time. I’m not sure making your API so complicated that users need to reach for the manual every time they use it is the best design approach, but it does seem to make sure people use the API correctly!
That was the good news. The bad news is that time was wasted by developers tracking down and fixing bugs that could have been caught instantly (using static analysis) as the code was written. I’m not going to name names, but I thought it might be interesting to share some of the code snippets that could probably have been better written. I won’t spoil the fun by explaining all the bugs - if anything isn’t obvious then feel free to post a comment.
This went 25 days before it was fixed:
String url_str;
...
url_str.replaceAll( " ", "" );
Not to be outdone, this version of the same problem existed for over 3 years until IntelliJ IDEA’s code audit found it:
String line;
while ((line = reader.readLine()) != null) {
line.trim();
Anyone care to guess what value this code returns? Fortunately this bug only lasted two days:
try
{
if(...)
{
return "Path Finder";
}
}
finally
{
return "Finder";
}
I’ll end with one of my favorites - probably because I do it so often myself. This one took 5 months to find:
public void setPropertyOnDelete(String propertyOnDelete) {
propertyOnDelete = propertyOnDelete;
}
This one only took two months:
protected PluginViewWrapper(PluginViewModel _model)
{
model = model;
}
The point here is not to single out any projects in particular. These are mistakes that we’ve all made dozens of times throughout our careers. However, there is no need for these mistakes to last beyond the first time you save your file, and I think it is an important move forward for our industry to understand that.
Leave a comment »