There’s an international flavor to the Glitch Watch mail bag this week.First, to New Zealand, where some 10,000 of phone network operator Telecom’s subscribers lost service after a software fault at the company’s Pukekohe exchange last Thursday. As of yesterday, one week later, some businesses were still without service. Steve Rigby from OHUG Power Equipment estimates that his company has lost $25,000 - $30,000 in lost revenue since the service has been out. The phone company has offered him a $30 credit on his bill.Next, to the Netherlands, a place dear to my heart, since that’s where I met my lovely wife. (I had to say that, she reads this blog.) If you are one of the 730,000 people who filed their taxes electronically in Holland this year, I have bad news for you. A glitch in the Dutch tax office system has “rendered all the statements completed and delivered electronically this year useless.” All 730,000 people will have to resubmit their information.Finally, to Kuwait, where customers of ISP FASTtelco found that they were mysteriously able to access other people’s Gmail accounts. The issue was apparently caused by a problem with the ISP’s DNS caching software. No official word from the ISP on the matter, but Google did confirm that the problem happened, and was subsequently fixed.
Lean programming has been a popular topic in conferences over the last couple of years, largely thanks to the experience and work of Mary and Tom Poppendieck. Lean programming has its roots in lean manufacturing, a management system focused on reducing waste and empowering workers to improve processes themselves. Lean manufacturing is largely based on the work of W. Edwards Deming, the statistician who revolutionized the culture and operations of many businesses by focusing on driving quality through the whole of the organization. Deming’s work was adopted and improved the results of many companies, most notably Ford, Toyota and Bell Labs (AT&T).I just finished reading Dr. Deming – The American Who Taught the Japanese About Quality, which was written by Rafael Aguayo who studied under Deming in the 80’s. The book is not, as I initially thought, a biography of Deming (although there is a short appendix on Deming’s life). It is a well-written explanation of Deming’s 14-point management system, littered with numerous examples covering a multitude of organizations and industries.Interesting points made in the book include:- Without having profound knowledge, making corrections via a feedback system is just tampering and can lead to disastrous results.- Who is responsible for quality? 90% of the things we define as ‘Quality’ are out of the “workers’” hands: training budgets, deadlines, design acceptance, tools budget and selection. These are all management issues, yet the “worker” is the one often blamed for poor quality - does that sound familiar?- Cooperation with your competitor in R&D. In Japan, R&D costs are lower because groups from different organizations are brought together to work on the technology, sharing ideas. Once they have technology figured out, competition in the market place is fierce, concentrating on features, price and performance issues.This last point may seem strange to many western development managers, but we have a great example in the Eclipse IDE project. Eclipse was created by numerous groups of people for a common cause, and then companies such as IBM and Borland compete in the marketplace with Eclipse-based products such as RAD and JBuilder.Having spent some time working as a sales manager myself, one part of the book I found tough to buy in to was the suggestion of eliminating sales quotas/targets. Aguayo presents no alternative to replace these metrics, and there may be a good reason for that - there isn’t one.Although written almost 20 years ago, this book is suitable for anyone wishing to learn more about how to change management techniques to focus on quality throughout the business. Although it is not software industry-specific, it provides some useful background to understanding many of the concepts of lean programming. Some of Deming’s management points and Aguayo’s examples may seem contradictory or even irrelevant in many development managers’ eyes, especially ‘Stable Systems’ and ‘Removal of Inspections’. This is something I will blog about some more in the next few weeks.
One story this week got plenty of press, so we don’t need to give it much airtime here. Heathrow Airport operator BAA’s baggage management system failed on Tuesday this week, and service was not fully restored until Thursday.Other stories didn’t hit the headlines, but caused their share of disruption. In Kootenai County, Idaho, a software glitch is being blamed for a tax revenue shortfall of more than $200,000 last year. According to the Idaho State Tax Commission, other counties use the same software, although no other problems have yet surfaced.Meanwhile, Nissan issued a recall this week for more than 16,000 Nissan Murano and Infiniti EX35 cars, owing to a software problem with the airbag control unit. According to the National Highway Traffic Safety Administration, “This could result in the passenger airbag not inflating in a crash in which it was designed to do so, and increasing the risk of injury.”But our favorite story of the week isn’t about a real software glitch at all - just an alleged one. New York Yankees fans are hopping mad after a study conducted by researchers at the University of Pennsylvania concluded that Derek Jeter was the worst fielding shortstop in baseball. (It didn’t help matters that the findings were presented at a meeting in Boston.)Jeter didn’t seem too worried though - according to the New York Post, he dismissed the findings as “a computer glitch”.
My friend Bill pointed me to this. Warning - ripe language alert!
More than 740,000 Verizon subscribers in Southern California lost access to their home voicemail last week after a database system based in Ontario crashed. Verizon is offering affected users rebates on their next bill between $5 and $15.
Meanwhile, in Tokyo, the Tokyo Stock Exchange halted trading of futures last week on the TOPIX index, thanks to a “platform glitch.” Apparently, the exchange has been upgrading its trading platforms after a series of problems, the last of which, in 2006, prevented normal trading for three months.
But our favorite story of the week comes from the land of bagpipes, haggis and kilts. Blink, and you could miss the wee Scots town of Forres as you ride the train from Aberdeen to Inverness. Especially since, until yesterday, the town was omitted from the automated on-board announcements, owing to a software error.
Operator ScotRail investigated, and “tracked” down the problem. “Forres is an important community to us,” said a ScotRail spokesman. I should think so: it’s home to at least two malt whisky distilleries.
I have always been suspicious of using the McCabe Cyclomatic Complexity metric (which counts the number of different executable paths through a module) as a measure of quality. Common sense dictates that the more paths through a program, the more complicated it is; but does that really mean parts of my program have bugs if they have a high McCabe value?
The metric was created in 1976, and there is little published evidence over the last 30 years from real world projects to indicate how useful it actually is as a measure of quality. Many people are somewhat familiar with the thresholds in the SEI table, although there has been precious little research that attempts to correlate a McCabe value in excess of, say 10 with an increased probability of “bugginess.”
Over the last year, we have performed a historical analysis of tens of thousands of source code files, applying individual metrics to them, of which McCabe was one. For each file, we analyzed the metrics, along with the defect rates for that file, and did the correlation.
The graph below shows the correlation of Cyclomatic Complexity (CC) values at the file level (x-axis) against the probability of faults being found in those files (y-axis).
The results show that the files having a CC value of 11 had the lowest probability of being fault-prone (28%). Files with a CC value of 38 had a probability of 50% of being fault-prone. Files containing CC values of 74 and up were determined to have a 98% plus probability of being fault-prone.
From this analysis, my suspicions about this metric have been quashed and, if we know nothing else about a file except that it has a high Cyclomatic Complexity value, we can now have more reason to believe that it is likely to cause problems (keeping in mind of course that there are no guarantees about anything in this life).
Our technical paper provides more information about how the data was collected and how the model to determine fault-prone files was applied.
As you may know, we’ve spent much of the past year analyzing which code metrics actually turn out to be good predictors of bugginess. We found that the McCabe Cyclomatic Complexity metric is actually pretty effective at predicting how buggy a piece of code is likely to be. It’s one of the many metrics that goes to make up the overall Enerjy score.
Rich has pulled out some interesting data from the analysis, and will be blogging about it in more detail later this week. But meanwhile, he stuck his video camera in the faces of a few well-known speakers, authors and bloggers to see what they thought.
British online gambling company Betfair is crying foul after users exploited a glitch in their online poker game software. The glitch affected certain “all-in” situations - i.e. when a player stakes all their chips on a single hand. Cash prizes were awarded for first and second place players, but the second place prize money was also paid out to 3rd, 4th, 5th and 6th place players. Once people figured this out, and posted on poker sites such as twoplustwo.com, “chaos ensued”. Betfair is denying that it lost as much as $8m, but it is pursuing players, trying to get its money back.
Also in jolly old England, the government has launched a new software program, imaginatively called RDSAP, that is supposed to calculate an energy rating for your home. That’s an energy rating, not an Enerjy rating, by the way. After April 2008, every home that is put on the market in the U.K. will require an energy rating calculated by RDSAP. Trouble is, apparently the program doesn’t take into account energy efficiency improvements that have been made to the house since it was originally constructed. Which, given we’re talking about England, could be quite a while ago.
Finally, as you may know, we live in the Greater Boston area, so we don’t want to spend too much time talking about Super Bowl XLII. But, as most of the allegedly 97.5m people were watching the annual football fest, some Dish Network subscribers in Arkansas missed about half the game owing to a software failure in the transmission equipment at the local Fox affiliate. According to the Northwest Arkansas Times though, all the Dish subscribers they found had cable also as a back-up. If only us New Englanders could have missed half the game, we might be a lot happier.
Scottish technology institute ITI Techmedia today announced a new R&D program that it is funding to the tune of £4.3m ($8.5m), focused on improving software quality. The “Software Integrity Engineering” program will aim to develop tools for improving software quality based on static analysis techniques. Hmm, that sounds familiar to me…
Scottish Parliament enterprise minister Jim Mather was quoted as saying, “Software underpins the performance of all sectors of the economy; therefore, new approaches that can improve the integrity of ever more complex systems will offer huge benefits to end users.”
I sometimes worry about whether we really care enough about software quality. Especially given that there are some things that can easily be improved with little effort. But it’s certainly encouraging to see that there are, in fact, others who see this as a problem too.
I caught Brian Goetz’s keynote “Concurrency is hard!!” at CodeMash 2008 and a follow up podcast in which Goetz discusses where he believes concurrent programming is heading. Brian states that concurrency was always a specialist discipline in the past, but believes that changes in hardware design (the arrival of dual core processors on almost every desktop) may force developers to become concurrent programmers whether they want to or not. He suggests that the first step is to find/build better tools that can help us find our mistakes that cannot be found easily today.
Goetz said that: “We cannot take a program, modify it, run it on a dual-core machine and expect it to have twice the performance of the original program.”
With that in mind I approached Neal Ford and Andy Glover to ask them if they thought that developers would have to live with the Java concurrency model, whether a new language would take its place, or if a shift in the programming paradigm will be required to take advantage of hardware advances.