‘WE INTERRUPT THIS BROADCAST …’
HOW A LITTLE BIT OF FAILURE CAN DO A LOT OF GOOD
by Mike Neuenschwander ~ October 29, 2008.
Permalink | Filed under: Hybrid Vigor, 21st Century Risk, Social Trust Online.
I was recently talking to some German friends about their trips to the United States. Apart from the standard touristy things they found memorable about the U.S., they were all greatly impressed that they could go shopping for almost anything in the middle of the night. Even to modern Europeans, the concept of midnight shopping seems fantastic. Imagine their amazement when I explained that, in the U.S., they could go shopping on almost any holiday as well.
Today’s business culture thrives on on performance, success, winning, and constant availability. The world continues on its frenzied trend toward 24×7 services, “five 9’s” of up-time, and six sigma products. The drive to succeed has provided us with all sorts of modern conveniences—and plenty of modern instances.
Perfect.
But I’d like to say a few words in defense of failure, because I believe failure has an important purpose and we can’t simply wish failure away by focusing on success. In my view, systemic failures can be averted simply by introducing some planned imperfections into the systems we build. One of the lessons that should be learned from the current financial crisis is how securities originally thought to be insular from the housing market were proven to be directly on the financial fault line.
Here’s the problem: when a system (such as a computer network, power grid, or financial market) performs steadily for a period of time, it fades into the background and seems as certain as the rising of the sun. Over time, a complex and interdependent mesh of relationships develops. Because these dependencies aren’t explicit, it becomes nearly impossible to predict how the beating of the proverbial butterfly’s wings in one part of the system can wreak havoc in another.
Is there a way to tease out the dependencies in such networks and develop complex distributed systems that fail safely? I think there’s a simple solution: introduce the element of failure. Shoot for 4 9’s instead of 5. Interrupt the broadcast so that we can run the drill before the disaster strikes. Learning to fail on a regular basis could help us deal better with much larger, systemic failures in the future.
