February 06, 2013, 4:34 PM — It's a calm, sunny weekend day without a cloud in the sky. The barbecue is lit and beers have been cracked open. Things couldn't get any better. But lurking somewhere in the power grid is a faulty component that's been hanging by a thread for weeks. And it has picked today to be its last.
Power to the data center is abruptly cut. Uninterruptible power supplies assume the load in anticipation of the backup generator startup -- a startup that never comes due to a tripped circuit breaker. A few minutes later, the data center plunges into darkness. Burgers and beers will have to wait. There's work to be done.
[ Paul Venezia has the scoop on how to stay connected when disaster strikes. | InfoWorld's Disaster Recovery Deep Dive Report walks you through all the steps in anticipating and handling worst-case scenarios. Download it today! | Sign up for InfoWorld's Data Explosion newsletter to help deal with growing volumes of data. ]
This is a scenario I've seen play out in strikingly similar ways about once every year. The first I can remember was in a colocation center in downtown Los Angeles near the height of the dot-com boom. The last one was only a few days ago, on the morning of July 4. In the first case, a sizable office building containing three subbasements' worth of data center gear were unceremoniously brought down, despite the presence of an enormous facilitywide battery-backup system, three mutually redundant backup generators -- each large enough to power a small town -- and path-diverse access to two separate commercial power grids.
The exact reasons behind the outages aside, it's clear that no matter how much capital you've invested in your data center infrastructure (or in a state-of-the-art colo), you're bound to lose power someday. However, very few of us actually go through the trouble to test our systems' response to an unexpected power outage. In my experience, the larger the data center or organization, the less likely it will have tested a power outage on purpose. Unfortunately, these same large organizations have the infrastructural complexity that almost guarantee continued trouble even after power is restored.