I spent three hours communicating with HP last night. At about 7:00 in the evening, my home-based HP Photosmart D7360 printer died with the error “Carriage jam. Clear the jam and press OK to continue.” Naturally, the first thing I looked for was a paper jam - no sign of any problems. I Googled the problem and was unable to come up with anything useful. OK, when all else fails, you have to resort to the dreaded Vendor Technical Support.
But this post is not about the quality of the printer, or even the software that controls it. Printers die. Hardware fails. These things happen. This post is about HP support’s buggy online chat system, and more specifically, how some thought should be put into what happens when it fails, and how to recover. I think there are some good lessons in my experience for us all.
So, being the hopeless geek that I am, I decided to forego the phone as a means of communicating with HP tech support, preferring instead to use the online chat system. After a minute or so my call is answered, I explain my problem, and start to go through the usual diagnosis steps. Look for paper jams. Power cycle the printer. Disconnect the USB cable and wait 30 seconds. On and on. Finally, the tech decides that what I have is a hardware error, asks for my serial number, and goes away to check the warranty (the printer is only a couple of months old). Then - bam! The chat session dies. Nothing I can think of will bring it back to life.
Frustrating, yes, but not the end of the world. I start a new chat session (choosing IE as the host this time rather than Firefox, just in case). After a minute or so, my call gets connected to a different tech (possibly in a different location - maybe even a different country, who knows). It takes me a while to explain what just happened, but the tech is sympathetic. Do I have a case number, he asks. No. OK, then there is nothing he can do; we have to start over from scratch. “Is there any paper jammed in the printer?” he asks. You can guess the rest of the story from here - after about another 20 minutes of diagnosis, this time the chat window just disappears. Gone. Nothing. Finally, I resort to using the phone, and eventually, after a total of three hours, a new printer is being shipped to me.
So what’s the lesson, from a software quality standpoint?
I think that as software developers, we have a tendency to think that things will never go wrong - or at least that it happens so infrequently that we don’t really need to worry about workflow. Exception handlers are usually the least tested parts of an application. After all, they are only going to be exercised once in a blue moon - possibly never - why should we waste time testing them, or even thinking about how users interact with them? If HP’s chat software had a way of resuming a dropped session - or at least hooking me back up to the same person I was speaking to when the line got dropped, I would be a significantly happier customer.
How much time do you spend considering workflows under error conditions?
Leave a comment »