How it works Static code analysis Technical paper

Choose a category:

Glitch Watch - Software glitch leads to KMart brawl

November 30th, 2007 by Nigel Cheshire. Posted in Glitch Watch

Toyota Prius drivers in Georgia have been surprised recently as their cars have been failing the state mandated annual emissions test. The test is usually done by plugging the emissions analyzer directly into the car’s onboard diagnostics module, but it turns out that the software used for the test is incompatible with the Prius. Normally, they would be able to stick a probe in the tailpipe exhaust, but at idle, the Prius cuts off the engine before the test can complete - thereby failing it.

In other car news, Nissan issued a recall this week for 686,500 Altimas and Sentras owing to a software glitch that causes the cars to stall at low speeds.

kmart.jpg

But our favorite story of the week comes from Wauwatosa, WI. The local KMart store was running a promotion, giving away $10 to anyone who opened a new credit card account. But a glitch in the approval software meant that everyone instantly received $4,000 credit, even if they didn’t qualify. Word quickly spread about the “free money”, and pretty soon, in the rush to get application forms, a fight broke out. Witness Sylvester Wilson said “It was a nice brawl. It came from inside to outside. If you go up there, you’ll see hair, earrings, all pulled out on the ground.”

Developers’ biggest time-sink: problem resolution

November 28th, 2007 by Rich Sharpe. Posted in Process Improvement, Software Quality

David Worthington’s recent article in SD Times is based on research results from Forrester’s “Problem Resolution Survey Results and Analysis,” and makes for interesting reading. The article states that “the biggest time-sink in the application production life cycle [receives] the least regard from development managers.” The time-sink to which Worthington is referring? Investigating and resolving application problems.

A couple of other gems from the article:

“The respondents spend almost three out of every 10 hours (29 percent) in various stages of troubleshooting: documenting, reproducing or testing. On the average, a problem takes six days or more to resolve, and one in four of the problems reported by a QA or test group are returned as irreproducible.”

“Of the time spent on defect resolution, 26 percent is spent reviewing information, 34 percent on reproducing the behavior, and the remaining 40 percent goes toward isolating the root cause of the problem.”

Someone more cynical than me may wonder why there is no time left over to actually code and resolve the problem! Seriously though, these numbers reinforce the need to continue investigating different ways of building more robust code in the first place, meaning to detect possible bugs earlier in the development life-cycle and to implement a program of continual process improvement.

The article does not divulge any specific methodologies these projects use. It would be interesting to know if any were using agile techniques such as incremental development or TDD (or even doing any unit testing - in our experience, most teams don’t).

Surprisingly, only 66% of managers would be interested in a solution to these problems, even if “it created significant efficiencies and improved quality” (two somewhat subjective dependencies). This reflects a serious attitudinal problem: for the remaining 34% it smells to me like: “post deployment this is someone else’s problem.”

By the way, these issues are not confined to niche areas: the findings were universal across verticals and enterprises.

Beautiful code: elegance vs. results

November 26th, 2007 by Rich Sharpe. Posted in Software Quality

Last week, I started reading Beautiful Code, which is a wonderful collection of life stories from various authors including Tim Bray, Michael Feathers and Karl Fogel, including coding examples in many different languages from Assembler and LISP to Java and Ruby.

Often, when people use the term “beautiful code”, they talk about attributes such as structure, adaptiveness, naming conventions, decouplement and all the stylistic attributes that make the code pretty (easy on the eye). In the book, Adam Kolawa raises an additional issue which is worth discussing: results.

For Kolawa, beautiful code means code that allows “…use and reuse without any shred of doubt in the code’s ability to deliver results…not what the code looks like but what I can do with it.”

This is an interesting viewpoint; it is generally accepted that most of the cost of a software project is in the maintenance phase, reworking and bug fixing. So the primary focus of code quality efforts is often on readability, with the expectation that others will take over maintenance of the code later.

Although “elegance” and “results” are related, there is a subtle difference; does the code do what I want it to do efficiently and is it maintainable?

We’ve all had to write ugly code in the past for different reasons. I recall my first programming job, working with Ada on a radar system where, to satisfy performance requirements, we changed all the case statements to if/else statements as they were a fraction faster and got us inside the timing requirements - the code “looked” worse - and was probably harder to maintain, but it did what it had to do.

Even in these days of seemingly limitless memory and processing power, with some companies like Google utilizing billion element arrays, sometimes there is still a balance to be struck between elegance and results.

Glitch Watch - Thanksgiving travel, Volvo recall

November 23rd, 2007 by Nigel Cheshire. Posted in Glitch Watch

Yesterday was Thanksgiving, and that makes this week the busiest travel week of the year in the United States. So, as you might expect, we saw a disproportionate number of software-related travel problems percolate to the top of the Glitch Watch mailbag this week. First, Air Canada experienced problems with their reservation system that caused delays for 96,000 passengers on Friday. There were more delays at Jacksonville International Airport after problems with the air traffic control software halted flights for about an hour. Then software problems were also blamed for a radio outage at Dallas-Fort Worth airport that grounded flights.

Meanwhile, in Bakersfield, CA, local restaurants have been touting higher environmental health scores than they should have been. A software problem was found to have generated wrong scores for four restaurant after inspections. The scores have been corrected and the restaurants have been invited to apply for a re-score inspection - a service that carries a $340 fee.

volvov70.jpg

Finally, just today, Volvo issued a recall for 18,000 V70 and XC70 models owing to a software problem with the side impact airbag system. According to Volvo spokesperson Maria Bohlin, “in the cases of collisions with small objects, like a pole, they are not triggered as quickly as expected.”

Humbled

November 20th, 2007 by Nigel Cheshire. Posted in Software Quality

After my rant-ish post yesterday about my Windows installation woes, and how much better the whole Mac experience was, I had two humbling, and related experiences today. First, I upgraded to Mac OS 10.5 - aka “Leopard“. Let’s just say that everything didn’t exactly go completely, ahem, smoothly. Then, I read Laura Thomson’s post about how there is rarely “one true way” and that any one technology doesn’t beat any other in all respects.

Yes, I continue to prefer to use my Mac (now that it’s working again) to a Windows machine. But, at the end of the day, be it Windows or OS X, it’s all just software, and we know what that means…

Inexcusable

November 19th, 2007 by Nigel Cheshire. Posted in Software Quality

The last couple of machines I’ve bought for home have been Macs. For years, I never even considered a Mac, primarily because there are just too many applications that are either not supported or poorly supported (e.g. Quicken) on the Mac platform. But, with the arrival of Intel based Macs and, in particular, Parallels Desktop for Mac, all that has changed and, like an ex-smoker, I am now a fervent Mac fan.

For reasons too complex to go into here, I needed to buy and install a new Windows PC for home. I installed it last night, and the nightmareish experience made me realize one of the reasons Macs are such good news for home users.

I ordered online at dell.com. The machine comes with a built-in sound card. But oh, you want speakers with that? You need to order them separately. Oh, you want wireless networking? Better get a third party wireless card.

But here’s what, in my opinion, is just inexcusable. I open the box, plug everything together, and run through the configuration scripts - taking the default option in every case, I might add. The first time Windows comes up, without my doing anything, I see an error message:

error.jpg

Maybe I’m just getting old and intolerant. But I feel like I have better things to do with my time than spending it on the phone with Dell tech support trying to figure out why some piece of 3rd party software that I never even asked for is failing…

Glitch Watch - Motorized transportation edition

November 16th, 2007 by Nigel Cheshire. Posted in Glitch Watch

My life has been revolutionized since the Massachusetts Registry of Motor Vehicles introduced online, real time wait time monitoring. Since our office is about 5 minutes from the registry, I can keep an eye on the wait times and scoot over there when the lines are at their shortest. It wasn’t such a happy story for drivers in Nevada this week, where long delays were caused at the Department of Motor Vehicles offices after a software upgrade. Frank Milburn, who was at the DMV office from 9:00 am until 5:00 pm said “When I finally got up to the window the computer crashed again and the clerk went on break.” Perfect! Well, I suppose Nevada is the only state to have a stretch of highway with a speed limit of Warp 7.

rossi.jpg

Talking of warp speed, in motorcycle racing, Valentino Rossi was forced to retire from the Valencia MotoGP race last week owing to a software problem with his Yamaha YZR-M motorcycle. The retirement cost him his second place position in the world championship standings. Yamaha team boss Davide Brivio said “Something happened with the software, or let’s say something happened and the software wasn’t prepared to accept.” Exactly.

Value of the Build Process

November 14th, 2007 by Rich Sharpe. Posted in Process Improvement, Software Quality

The build process is still an area I see in many organizations that, perhaps surprisingly, is overlooked. Many teams do just enough to compile and package up an application, and not much more. There is significantly more value that a well defined build process can add.

I am an advocate of a full build process. What do I mean by full? I mean that a build does the following:

  1. Gets the latest source code from the Repository system
  2. Compiles and runs unit tests
  3. Runs analytics and QA gates (at development level)
  4. Produces reports
  5. Informs the team (or at least the Build Manager) if any problems occur
  6. Publishes the application to a test server so the test team can get straight to work

And, does all of this automatically, eliminating mundane, repetitive manual processes (which can, and often do, go wrong). The ultimate goal, of course, is Continuous Integration (CI), but let’s not get ahead of ourselves.

By scheduling this process nightly (or even more frequently), the team is guaranteed to discover compilation errors that may not be present on their workspace. (The developer’s workspace and the build machine may not be in sync, and there may be other software that needs to be added to the build and test machines.)

Also, unit tests can be run against the integrated code, again showing any issues that may not arise on a single developer’s machine. If a problem does occur, the system can email the build manager to inform him/her of the problem so they can investigate and report back to the team what the problem is.

Another huge benefit is the fact that the test team can walk in and get straight to work without the hassle of setting anything up and jumping over technical hurdles to get the application configured and working before they can start doing their job. I’ve seen examples where testers have to spend up to half a day trying to resolve these issues.

By adding analytics and reporting (i.e. going beyond the minimum requirements), management can receive automated updates of the health of the project to be prepared for any meeting with the team. You can produce a lot of reports from different plug-ins which can provide great data for constructive feedback to the team and provide visibility into the project at different levels.

ANT or Maven can be used to write scripts to perform the tasks of compiling, executing tests, reporting and setting the application up to be copied to a test server, while CruiseContol, Hudson and Continuum are all free CI Servers that can perform scheduling and automate these tasks.

lava.jpg

If you are new to this, or feel that your build process is at a ‘bare minimum’ all this may seem like a daunting task. ‘Pragmatic Project Automation’ by Mike Clark spells out how to automate the build process in less than 150 pages and even shows how to use lava lamps to indicate whether the build succeeds or fails.

CI introduces the concept that the build process gets triggered every time a change to the code or a configuration file is committed to the version control system. The two greatest benefits of CI, in my opinion, are that (a) risk is further reduced (any defects, by definition, must have occurred with the last edit, and can be fixed straight away) and (b) the fact you can produce deployable software at any time. ‘Continuous Integration – Improving Software Quality and Reducing Risks’ by Duvall, Matyas and Glover is a good book that explains this further.

Glitch Watch - Election edition

November 9th, 2007 by Nigel Cheshire. Posted in Glitch Watch

The emergency dispatch center in New Haven, CT was flooded with 911 calls from all over the country last week when a call routing system based in Colorado went haywire. Dispatchers in New Haven received more than 500 calls in one 44-minute period, from as far away as Florida, Texas and even Puerto Rico. There was no news on whether the emergency calls got redirected successfully.

Meanwhile, the London Stock Exchange had problems at the end of the day yesterday, when problems with the electronic trading system meant that traders were unable to close their books for the day. The result was that the FTSE 100 - the British equivalent of the DJIA - appeared to have gone up by 2% in the late afternoon, when in fact it was down by 1.3%. According to British newspaper the Daily Telegraph, the defect comes just one week after the trading system was upgraded.

lse.jpg

But the majority of this week’s Glitch Watch mailbag is filled with stories of election snafus. Voting was delayed by an hour in South Fulton, GA, because of problems with an electronic voting machine. Computer crashes during the Cuyahoga County, OH elections caused technicians to manually reboot the machines every 45 minutes throughout the night, just in case. More problems in Kingston, NY, where the Ulster County web site crashed, apparently through overload. (Population of Ulster County: 178,000.) Finally, in Iowa City, a software error actually caused the result of the referendum on raising the bar entry age to 21 to be reported incorrectly. Iowans were voting on whether 19- and 20-year-olds should be banned from entering Iowa City’s bars and nightclubs after 10 pm. Originally reported as 51% of voters in favor, once the programming error was found and fixed, it turned out that actually 57% were against.

Static analysis: false positives and false negatives

November 7th, 2007 by Mark Dixon. Posted in Coding Standards, Software Quality, Static Analysis

In his last post, Rich talked about using static analysis to detect incorrectly coded JUnit tests that failed to report assertion failures because they didn’t occur on the main thread. For example, the test

public void test()  {
    Thread t = new Thread() {
        @Override
        public void run() {
            assertTrue(false);
        }
    };
    t.start();
}

should fail, because the assertion will always fail. However, if you type this code into a test case and run it, the test passes. This is because failed assertions are designed to throw an exception that the JUnit framework catches and reports. However, the JUnit framework is running on the main thread and so the exception handler does not see the exception that is thrown from within the newly created thread.

Rich’s post describes a fix for the problem - catching the Exception on the worker thread and saving it in an instance variable that can be picked up by the tearDown code back on the main thread. However as with any bug, it’s worth spending some time thinking about it could automatically be detected. After all, any bug that you can find using static analysis at least saves you a run through your unit tests and at best stops you from shipping buggy code, so catching as much as you can using static analysis saves time and money.

As authors of static analysis tools, this caused quite a discussion here in the Enerjy office that goes to the heart of why static analysis is harder than it looks.

There are two ways for static analysis to fail: false positives and false negatives. A false positive error happens when the analysis tool reports a problem that doesn’t really exist. Most tools will allow you to mark the code in some way so that the tool won’t continue to report the problem, but any tool that fires too many false positives will pretty soon have users reaching for the uninstaller.

On the other hand false negatives occur when the tool fails to detect a real error. These are also dangerous, as the developer will come to rely on a clean bill of health from the tool as an indication that the code is error-free.

These two types of error are usually at opposite ends of the spectrum: if you make a rule more sensitive, then you reduce the likelihood of false negatives at the expense of increasing the likelihood of false positives. If you decrease the sensitivity of the rule, the behavior is reversed. So, writing a static analysis rule is usually a tradeoff between the two types of error.

Our approach here at Enerjy is to avoid false positives, even at the expense of missing some genuine code errors. Our experience has been that false positives quickly lead to a tool falling into disuse, and a tool that finds 80% of problems but is used is a lot more valuable than a tool that finds 85% of problems but sits on a shelf gathering dust. There are exceptions of course: for critical software it’s probably worth using the more sensitive analyzer, since the cost of letting one bug through is higher than the cost of manually reviewing all of the errors reported by the tool and separating the genuine errors from the false positives.

What makes Rich’s bug so interesting is that it’s hard to detect using static analysis, because any rule would have a reasonable chance of both false positives and false negatives. FindBugs is one of the few analyzers brave enough to take on this situation, so I’ll use that as an example.

First, note that FindBugs does correctly detect the problem in the code above. But now let’s change the code slightly:

public void test2() {
    Thread t = new Thread() {
        @Override
        public void run() {
            validateContent();
        }
    };
    t.start();
}

private void validateContent() {
    assertTrue(false);
}

All I’ve changed is to move the assertion into a helper method - a very common idiom when you need to reuse the same set of assertions in different tests. FindBugs fails to detect this problem - a false negative. Why doesn’t FindBugs detect this case?

In order to find the problem, FindBugs would have to perform what’s called inter-procedural flow analysis i.e., it has to know how program execution proceeds from one method to the next. That’s the only way it could detect that the validateContent helper method is called from the run method of a thread that’s started from a JUnit test case. It is possible to perform this kind of analysis, and some of the higher-end analyzers do, but it’s very hard, and, more importantly, it’s very slow. It would be next to impossible, for example, to include this kind of check in an analyzer that automatically checked code as you type.

Now let’s make another change to the method:

public void testSynchronous() {
    Runnable r = new Runnable() {
        public void run() {
            assertTrue(false);
        }
    };
    r.run();
}

This time, FindBugs reports that the assertion won’t be noticed by JUnit. But if you run the test, you’ll see that JUnit sees the assertion just fine, and fails the test. This code works because we’re not executing the code in the runnable on a different thread, we’re running it on the main thread. FindBugs has no way of knowing that and so it reports the problem just in case: a false positive.

Now, I’ve met the authors of FindBugs and they’re smart guys. They’re well aware of these issues, and they decided that, on balance, the rule was correct often enough, and involving a serious enough defect, to justify the error rates that it has. But it’s important to understand that this is a design decision on their part. Different tools will have different tradeoffs in the rates of false positives and false negatives that they’re prepared to accept and there can be no ultimate analysis tool that has lower error rates than all others. That’s why, for critical code, your best bet is to run multiple analysis tools. Each tool will include rules that didn’t fit the design parameters of the others.

So, what are you looking for in a static analysis tool? Would you prefer a rule that fired incorrectly 10% of the time over not having that rule at all? Does your answer change if that percentage moves to 25%? 50%? 75%?