Tuesday, December 18, 2012

The Risk of Bugs

Dear Blogspot,

I've lately been thinking about how economists view risk, and how that analysis can help us think about software testing.

Economists refer to risk in a rather ordinary way, such as talking about the risk of getting into a car wreck, the risk of a particular stove causing a fire, the risk of being killed by Dracula, or the risk of your house being flooded.  A "risk" then is some unhappy event that we wish to avoid, since we find it in our interests to be happier rather than more unhappy.   Risk-mitigation is an economic "good", something we value and pursue.

Economists approach choices in risk-mitigation the same way they approach their analysis of choice regarding any economic good -- subjectively and marginally.  Those all-important economic terms mean that each person might have a different tolerance level for risk (the subjective part), and that we weigh each risk-mitigation opportunity according to the cost of obtaining it (marginal cost).

Because of this, economics rarely predicts a zero-risk situation where people are involved.  There are many reasons why.  The first is a knowledge problem: every possible risk is not instantly obvious.  The world is a complex place, and lots of unpredictable things happen.  The second reason was alluded to earlier, that some risks are just too costly to eliminate given our subjective preferences.  For example, we might worry about the chance of a meteor falling through our ceiling and striking us on the head while we sleep, and calculate that the only way to be completely safe is to live deep in the roots of a tall mountain.  The last reason is that the desire for safety from risk is unbounded.  For example, if we get ourselves a new robot, and find that it is homicidal, we might worry a lot about being killed and have it reprogrammed with the "Three Laws" to prevent that.  Afterwards, however, we might start worrying about it breaking our dishes while trying to clean them, and have its metal hands padded.  After that, we might eventually even worry about running into it in the dark and stubbing our toe, and so forth, and so on.

As you might have guessed, this exact same analysis can be applied to flaws in the implementation of a specific software design, commonly called a "bug".  A user encountering the effects of a bug is definitely an event that they wish to avoid, and (for various reasons) that we as engineers do not wish to introduce.  Software houses typically employ QA to try and discover these flaws before a user finds them, so that they can be fixed.  Other mitigation practices, such as Test Driven Development (TDD), unit tests, test automation, and so forth are also used to prevent users from encountering bugs in their software.

However, blogspot, I would argue that pursuing a zero-bug situation is neither wise nor practical, for all the same reasons mentioned previously.  Modern software systems are complex interactions of many different systems, and user interactions with software can introduce further uncertainty, making the task of even predicting every possible bug all but impossible.  Further, sometimes a bug is just too expensive to fix.  You may get a bug that the software is unable to handle data buffers larger than any reasonably available storage devices, which is something no reasonable engineer would fix.  On the other hand, we may discover a bug that has a very low impact on the user, and low chance of occurrence, but discover that it will take hundreds of man-hours of engineering to correct it, making such correction too costly to pursue.  An example of this might be that your software is vulnerable to a flaw in a specific model of video card that changes the visible shade of red slightly, and that the cost of writing code to detect and work around this hardware flaw isn't worth a slight color difference.  Lastly, the desire to be free of bugs is irrationally unbounded.  A quickly fixed bug that causes the software to crash on boot-up is an easy call, but eventually one would start worrying about software flaws that occur only when in certain rare hardware environments -- it would never end.

Well, blogspot, that is all I have to say on this subject.  As in my other posts, I would encourage my fellow engineers to think about our problems in the same way we actually end up making them: by weighing the costs and the benefits of each choice.  Demanding a zero-bug software product may make one seem bold and principled, but is only a way to set oneself up for disappointment, while simultaneously foregoing time that could have been spent on new products or more important improvements to existing ones.