How it works Static code analysis Technical paper

Choose a category:

Follow up to ‘missing braces in If statements’

August 1st, 2008 by Rich Sharpe. Posted in Coding Standards, Software Quality, Static Analysis

Earlier this week on Javalobby, I posted an extract from our monthly newsletter regarding our analysis of the ‘missing braces in If Statement’ rule firing and the potential bug involved.If you omit braces and use Static Analysis tools it is a problem actually finding these bugs. Why? Because you probably have the rule turned off!Having any tool tell you of violations in your code purely due to a stylistic preference will result in thousands of false positives and eventually demotivate the developer from using the tool, therefore, for the other good causes these tools promote, in this case it is probably better to switch the rule off.Hopefully Unit Tests are in place to cover these areas if static analysis is not used. The only other way to find these bugs is manual code review (laborious, time consuming and introduces human error i.e. the bug may be missed anyway) or debugger tracing.Another interesting aspect about the post was the comments. Many of those who omit braces in If Statements do so for readability purposes. Psychologically, this may mean that these developers see readability as a higher aspect of quality than possible fault-prone code. I’m not stating that readability is not important, far from it. However from a business perspective, having the code released with less faults is a higher quality perspective.

Enerjy Services Launch Today!

July 25th, 2008 by Rich Sharpe. Posted in Coding Standards, Enerjy, Process Improvement, Software Quality

To follow from the successful launch of our free Eclipse plugin, today sees the official launch of our Service Offerings for Java Development Teams.Over the last 5 years we have been helping development teams increase the quality of their applications.From these experiences and our own experiences we have formalized a collection of offerings that can help improve code quality further.In summary our Services consist of:- An Internal Quality Code Report- Continuous Integration Services- Context Specific Standards for the free Enerjy Software plugin.More details can be found on our Services page of our website.

The Misnomer of Best Practices

June 24th, 2008 by Rich Sharpe. Posted in Coding Standards, Software Quality

One evening after sessions at the Better Software Conference, Dan North and I got into a discussion regarding the phrase ‘Best Practices’ and concluded that this term was actually a misnomer.

Let’s take a non-software analogy; wearing a seat belt in a moving vehicle.

With all the studies that have been performed over the years, one may believe that wearing a seat belt is a best practice. However, for a very small minority of cases, it is not the best thing to be wearing a seat belt. EMS and Police personnel in some countries are not required to wear seat belts, because they can respond faster unhindered by a seat belt. Also some drivers, like those in the TV show Ice Truckers, do not wear seat belts because they need to be able to jump out of their rig the moment they hear ice beginning to crack. These may be minority cases and for over 99% of us it is a good practice (never mind a legal requirement) to wear a seat belt–but it is not a Best Practice.

When we talk about Best Practices we really mean ‘the current best thing to do in a particular context’. When coding, depending on the situation, we try to solve an issue using some practice we know works effectively. Wait! Haven’t we heard this before? Isn’t this what we use to describe Patterns? On reflection, patterns would be a sub-set of what we wish to achieve and still other ‘good practices’ will need to be enforced somehow, especially early in a developers career.

The terms ‘Best Practices’ and ‘Standards’ are also used interchangeably, especially when applied to code. This is wrong! Coding standards may be formed from current ‘best practices’ but other issues such as law and external environmental concerns may mean that the code standards are not necessarily the current best way to write a piece of code.

This is just one example of the terminology problems that needs to be clarified in our profession. I don’t believe a committee can enforce this, but terminology, over time, will become more widely used and accepted as our industry matures, as has occurred in other industries over a period of time.

Many have written about language ambiguity being one of the key issues which leads to misunderstanding requirements and that a language needs to evolve so business and IT understand each other.  I would go one step further and declare that at least two formalized languages are needed. One for the Development team for lower level issues so statements such as ‘this lends itself to a Strategy Pattern’ or ‘favor composition over inheritance’ are understood by all the team members and another for Business Analysts, Project Leaders and Stakeholders to ensure no ambiguity exists in the requirements.

IMO the first language is closer to being realized than the second. Maybe interest in UML will be rejuvenated with Microsoft’s recent announcement of UML support to be added to Visual Studio Team System providing another step towards this goal.

Crazy Beans and Ugly Constructors != Builder Pattern

March 25th, 2008 by Rich Sharpe. Posted in Coding Standards, Software Quality

We are commonly taught to create objects using the JavaBean pattern, where for every variable there is a corresponding getter and setter, or by passing in attributes to the object’s constructor on creation. For example:

Employee emp = new Employee("name", "some address", "city", "state",             employee_number, salary, department, … more arguments);

Not only is this not ideal to read, you must remember to pass the values in the correct order. Even type safety will not help you if, for example, you interchange the city and state arguments. Also, if the Employee class has only two mandatory fields (name and employee number) the developer must add null entries to fields they do not want to set at creation time.This method also violates the Open/Closed principle (PDF link) where classes should be open for extension but closed for modification – thus to increase functionality, add new code (via abstraction), rather than changing old code.Using a Builder Pattern allows you create the same object in the following way:

Employee emp = new Employee.Builder("Rich Sharpe", 32)            .address("1 Java Way")            .city("Javaland")            .salary(12000.00)            .department(AccountsDept)            .build();

Not only is this more robust than the JavaBean pattern or the long unwieldy constructor, but the creation of this object is now far more elegant to read.Here is the partial Employee class:

public class Employee {  private String name;  private String address;  private long employee_number;  private double salary;  public static class Builder {    private String name;    private String address;    private long employee_number;    private double salary;    public Builder(String name, int employee_number){      this.employee_number = employee_number;      this.name = name;    }    public Builder salary(double salary){      this.salary = salary;      return this;    }    public Builder address(String address){      this.address = address;      return this;    }    //Additional setters here...    //Finally add the build method    public Employee build(){      return new Employee(this);    }  }  private Employee(Builder builder){    this.name = builder.name;    this.address = builder.address;    this.employee_number = builder.employee_number;    this.salary = builder.salary;    //Copy remaining data from Builder to Employee...  }}

Create a static ‘builder’ class within your class and provide individual setter methods. Finally add a ‘build’ method that returns an instance of its class as a parameter to a private constructor of the class you wish to build.If a specific exception needs to be thrown, then the private Employee constructor and public build() methods can handle this (remember IllegalArgumentException and IllegalStateException do not have to be specifically declared in a throws clause).Usually this type of creation pattern is used when an object will not change during its life. I chose to create a User class as for this example as a huge benefit of this type of creation is that the object is immutable and therefore thread safe.Although a user may change some attributes; such as their last name if married or address when moving, these are so rare that it may be better to just delete the current object and create another as object creation is very inexpensive and one retains the benefits of an immutable object.

Static analysis: false positives and false negatives

November 7th, 2007 by Mark Dixon. Posted in Coding Standards, Software Quality, Static Analysis

In his last post, Rich talked about using static analysis to detect incorrectly coded JUnit tests that failed to report assertion failures because they didn’t occur on the main thread. For example, the test

public void test()  {
    Thread t = new Thread() {
        @Override
        public void run() {
            assertTrue(false);
        }
    };
    t.start();
}

should fail, because the assertion will always fail. However, if you type this code into a test case and run it, the test passes. This is because failed assertions are designed to throw an exception that the JUnit framework catches and reports. However, the JUnit framework is running on the main thread and so the exception handler does not see the exception that is thrown from within the newly created thread.

Rich’s post describes a fix for the problem - catching the Exception on the worker thread and saving it in an instance variable that can be picked up by the tearDown code back on the main thread. However as with any bug, it’s worth spending some time thinking about it could automatically be detected. After all, any bug that you can find using static analysis at least saves you a run through your unit tests and at best stops you from shipping buggy code, so catching as much as you can using static analysis saves time and money.

As authors of static analysis tools, this caused quite a discussion here in the Enerjy office that goes to the heart of why static analysis is harder than it looks.

There are two ways for static analysis to fail: false positives and false negatives. A false positive error happens when the analysis tool reports a problem that doesn’t really exist. Most tools will allow you to mark the code in some way so that the tool won’t continue to report the problem, but any tool that fires too many false positives will pretty soon have users reaching for the uninstaller.

On the other hand false negatives occur when the tool fails to detect a real error. These are also dangerous, as the developer will come to rely on a clean bill of health from the tool as an indication that the code is error-free.

These two types of error are usually at opposite ends of the spectrum: if you make a rule more sensitive, then you reduce the likelihood of false negatives at the expense of increasing the likelihood of false positives. If you decrease the sensitivity of the rule, the behavior is reversed. So, writing a static analysis rule is usually a tradeoff between the two types of error.

Our approach here at Enerjy is to avoid false positives, even at the expense of missing some genuine code errors. Our experience has been that false positives quickly lead to a tool falling into disuse, and a tool that finds 80% of problems but is used is a lot more valuable than a tool that finds 85% of problems but sits on a shelf gathering dust. There are exceptions of course: for critical software it’s probably worth using the more sensitive analyzer, since the cost of letting one bug through is higher than the cost of manually reviewing all of the errors reported by the tool and separating the genuine errors from the false positives.

What makes Rich’s bug so interesting is that it’s hard to detect using static analysis, because any rule would have a reasonable chance of both false positives and false negatives. FindBugs is one of the few analyzers brave enough to take on this situation, so I’ll use that as an example.

First, note that FindBugs does correctly detect the problem in the code above. But now let’s change the code slightly:

public void test2() {
    Thread t = new Thread() {
        @Override
        public void run() {
            validateContent();
        }
    };
    t.start();
}

private void validateContent() {
    assertTrue(false);
}

All I’ve changed is to move the assertion into a helper method - a very common idiom when you need to reuse the same set of assertions in different tests. FindBugs fails to detect this problem - a false negative. Why doesn’t FindBugs detect this case?

In order to find the problem, FindBugs would have to perform what’s called inter-procedural flow analysis i.e., it has to know how program execution proceeds from one method to the next. That’s the only way it could detect that the validateContent helper method is called from the run method of a thread that’s started from a JUnit test case. It is possible to perform this kind of analysis, and some of the higher-end analyzers do, but it’s very hard, and, more importantly, it’s very slow. It would be next to impossible, for example, to include this kind of check in an analyzer that automatically checked code as you type.

Now let’s make another change to the method:

public void testSynchronous() {
    Runnable r = new Runnable() {
        public void run() {
            assertTrue(false);
        }
    };
    r.run();
}

This time, FindBugs reports that the assertion won’t be noticed by JUnit. But if you run the test, you’ll see that JUnit sees the assertion just fine, and fails the test. This code works because we’re not executing the code in the runnable on a different thread, we’re running it on the main thread. FindBugs has no way of knowing that and so it reports the problem just in case: a false positive.

Now, I’ve met the authors of FindBugs and they’re smart guys. They’re well aware of these issues, and they decided that, on balance, the rule was correct often enough, and involving a serious enough defect, to justify the error rates that it has. But it’s important to understand that this is a design decision on their part. Different tools will have different tradeoffs in the rates of false positives and false negatives that they’re prepared to accept and there can be no ultimate analysis tool that has lower error rates than all others. That’s why, for critical code, your best bet is to run multiple analysis tools. Each tool will include rules that didn’t fit the design parameters of the others.

So, what are you looking for in a static analysis tool? Would you prefer a rule that fired incorrectly 10% of the time over not having that rule at all? Does your answer change if that percentage moves to 25%? 50%? 75%?

False positives in JUnit

November 5th, 2007 by Rich Sharpe. Posted in Coding Standards, Software Quality, Static Analysis

Recently, I was fortunate enough to catch Josh Bloch’s ‘Java Puzzlers Episode VI: The Phantom-Reference Menace/Attack of the Clone/Revenge of the Shift’. One puzzle in particular concerned the use of threading in JUnit testing. This is a great example of a problem developers may face, however, I believe it also points to a much more significant problem – false positives in feedback systems.

Reproduced here (with the author’s permission) is the problem:

Q. How Often Does This Test Pass?

Test

The assumed answer may be a) or b) due to race conditions when the thread accesses the variable ‘number’. The actual answer is c) It always passes. This is because the assertion is running in a background thread to the ‘test’ method and JUnit does not support concurrency and is unaware of any action or exception thrown in the background thread, and so will exit with success. The code to resolve the issue is included at the bottom of this post.

Once you know what the problem is, it’s not that difficult to resolve. But therein lies the problem. JUnit passes this test every time. A manual code review would, in most cases, probably not catch this and, in any case, how many passing tests are even reviewed? This test will pass, the code will be shipped and it’s not until a run time problem occurs that the problem is reported back and the scratching of heads occurs as all the unit tests pass. (OK – I’ve taken a bit of a liberty here, one would hope system testing would catch this but it is not always the case).

The only really effective way to catch problems like these is through the use of static code analysis tools. FindBugs is one tool that can find and flag this particular problem. These tools are very good but not a panacea; of course some problems may still escape undetected. However, the extensibility of these code analysis tools ensures that, once discovered, rules covering these problematic areas can be added, enhancing the tool’s power and utility.

(For those interested, take a look at Puzzler number 5 – ‘Mind the Gap’ (PDF link), in which the java.io.skip(gap_to_skip) returns any value between 0 and the parameter passed in. One would wrongly assume that the return value would be that of the next element after ‘gap_to_skip. Bloch stated that in the source code of the JDK, there are 67 instances of this call, and in 56 cases the return value is not checked!)

My take from all of this? Although false positives are a fact of life, we can and should use static analysis to find those cases that can be found, and we should be diligent about adding new rules, where possible, to cover problems that crop up, so that they don’t happen again.

Solution:

Bloch showed a couple of possible solutions to this problem:

1. Create instance fields in the test to hold any exceptions generated in the background thread and set them in the thread’s run method.

volatile Exception exception;
volatile Error error;

Thread t = new Thread(new Runnable() {
public void run(){
try{
assertEquals(2, number);
} catch(Error e){
error = e;
} catch(Exception e){
exception = e;
}
}
});

2. Check those fields in a tearDown method.

//Triggers test case failure if any thread asserts failed
public void teardown() throws Exception{

if(error != null)
throw error;
if(exception != null)
throw exception;
}

The missing link - how coding standards relate to bugginess - part 1

October 8th, 2007 by Mark Dixon. Posted in Coding Standards, Software Quality

Nigel posted a couple of times last week about the importance of coding standards. This has been a big deal for me ever since the first day, 15 years ago, that I started working on a project that was large enough to need two developers. The other developer on my team was talented and wrote honest, reliable code. The problem was, it was never clear to me that his code was that good. He just wasn’t an idiomatic programmer and so he might code a loop one way on Monday and the same loop a completely different way on Tuesday. Just simple changes in naming and use of conventions would have increased my productivity significantly as I tried to absorb and understand his code.

Intuitively, it seems clear that code you can read quickly is code that you can understand quickly. Which means faster, more thorough code reviews, and more accurate maintenance. If that’s true then, to cut straight to the point, you’d expect code that doesn’t follow standards to contain more bugs than code that does. This is a hypothesis that we can test, and one of things we’re working on here at Enerjy is finding out whether there is a link between coding standards and bugginess. In preparation for that, I’ve been running static analysis over several major open source projects. We’re not ready to talk about our final results yet, but in the mean time, I’d like to share some interesting findings from the static analysis.

For context, this is an ongoing project that we will continue to blog about. But as of now, we have analyzed about 8,300 files across 11 open source projects.

First the good news. There are two main categories of violation that never showed up throughout the entire life of the projects. The first was naming standards. One of the best things Sun did from day one with Java was to have an opinion on naming style and to religiously stick to it in all of their tutorials, books and articles. Open source contributors appear to have been paying attention; not once did we find a class, interface, or interface method name that deviated from Sun’s original standards. There was a little more variance in method names, with 86 files using non-standard method names. Field names were even more diverse (256 files with non-standard field names) although these were primarily in three projects that had their own, slightly different coding standards.

The second, slightly surprising, category of violation that never showed up, relates to use of esoteric language features like finalization and customized serialization. Not once did anyone declare a finalize method incorrectly, or attempt to explicitly call one. No-one ever caught ThreadDeath without re-throwing it. And the serialization customization methods readObject and writeObject were at least correctly declared. Although this initially took me by surprise, I now think it makes sense. I’ve been coding Java for maybe seven years and I still don’t fully get serialization, which means I have to reach for the JPL book every time I need to use it. I’m guessing I’m not alone in this, which means that these more obscure features tend to be used carefully, and with reference to the docs each time. I’m not sure making your API so complicated that users need to reach for the manual every time they use it is the best design approach, but it does seem to make sure people use the API correctly!

That was the good news. The bad news is that time was wasted by developers tracking down and fixing bugs that could have been caught instantly (using static analysis) as the code was written. I’m not going to name names, but I thought it might be interesting to share some of the code snippets that could probably have been better written. I won’t spoil the fun by explaining all the bugs - if anything isn’t obvious then feel free to post a comment.

This went 25 days before it was fixed:

String url_str;
...
url_str.replaceAll( " ", "" );

Not to be outdone, this version of the same problem existed for over 3 years until IntelliJ IDEA’s code audit found it:

String line;
while ((line = reader.readLine()) != null) {
line.trim();

Anyone care to guess what value this code returns? Fortunately this bug only lasted two days:

try
{
if(...)
{
return "Path Finder";
}
}
finally
{
return "Finder";
}

I’ll end with one of my favorites - probably because I do it so often myself. This one took 5 months to find:

public void setPropertyOnDelete(String propertyOnDelete) {
propertyOnDelete = propertyOnDelete;
}

This one only took two months:

protected PluginViewWrapper(PluginViewModel _model)
{
model = model;
}

The point here is not to single out any projects in particular. These are mistakes that we’ve all made dozens of times throughout our careers. However, there is no need for these mistakes to last beyond the first time you save your file, and I think it is an important move forward for our industry to understand that.

More on coding standards

October 3rd, 2007 by Nigel Cheshire. Posted in Coding Standards

After I posted on Monday about the relationship between coding standards and maintainability, I was interested to see Bob Martin’s piece suggesting that software developers are too tolerant, too politically correct, when it comes to standards. The fact that no one style or method is right, says Martin, makes us timid - we tolerate a multiplicity of styles rather than putting a stake in the ground and saying “this is the way we do things around here.”

Martin goes on to compare software development to the martial arts, suggesting that over time, wise teachers will emerge whose teachings the rest of us will follow.

I’m not sure I subscribe to that part of his argument, but it seems to me that it’s the role of the team lead, project architect or development manager to lay down a lightweight, common set of standards on a per-project basis. That way, standards can evolve over time, but there is still an agreed “way of doing things” at least within the same project. The problem may be that I’m not sure how many people in those roles feel comfortable taking a strong position on those issues.

Coding standards, readability and maintainability

October 1st, 2007 by Nigel Cheshire. Posted in Coding Standards, Software Quality

Wendy Friedlander posed an interesting question last week (actually, three interesting questions):

What makes code maintainable? What does it mean to be readable? Why does it matter?

Friedlander makes the very valid point that there are some design principles, such as single responsibility and separation of concerns, that are more important in terms of maintainability than readability is. She backs this point up by citing her experience, common to many I’m sure, of writing code in a few weeks, then not touching it for a few months, at which time it becomes hard to understand the design again. “My coding standards and styles change constantly,” Friedlander says.

While I agree wholeheartedly with the point about design principles being more important than coding standards and style, I don’t think you can dismiss those things as irrelevant in terms of maintainability.

Friedlander says, “no matter how “awful” a solution may be, at one point it made sense to someone, so with some detective work it can be understood.” That’s true if only one person has had their hands on the code. But in the majority of cases, many people have been in and monkeyed around with it, and, in my opinion at least, without a core set of basic coding standards, the more different styles that have been applied, the more involved the “detective work” will need to be.

An easy win for code quality

September 11th, 2007 by Nigel Cheshire. Posted in Coding Standards, Software Quality

The more time I spend talking to people about how to improve the quality of their code, the more I realize that people are looking for easy wins. We’re all stretched for time, and up against it in terms of project deadlines. No-one has time to bring in a completely new way of working.

One of the easiest wins, in my opinion, is to start a concerted effort to improve the readability of your code. All the way back in 2000, Joel Spolsky observed that “It’s harder to read code than to write it”. And, when you think about it, code is read significantly more often than it is written. Here’s what Spolsky said:

“Programmers are, in their hearts, architects, and the first thing they want to do when they get to a site is to bulldoze the place flat and build something grand. We’re not excited by incremental renovation: tinkering, improving, planting flower beds.
There’s a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong.”

I would finish that statement off by saying: They are probably wrong - it’s just not easy to read. And, by the way, Microsoft’s Peter Hallam claims that programmers spend between 75 and 80% of their time just understanding existing code. Imagine if that code were twice as easy to read - that would be a huge productivity boost.