May 11, 2011, 8:42 PM — I was discussing my article on the Amazon EC2 Cloud failure, and a related blog (as well as reaction from the Twittersphere) with my editor, and the issue came up of cloud-related disaster recovery falling under the responsibility of IT. Having covered IT for 15 years, I heartily disagreed.
In my experience, IT is among the first to ask the tough questions: How do we back this up? How do we bring our systems online after a failure? How do we build redundancy into our network?
The problem that cloud-based services, including software-as-a-service, present is that some companies are trying to get by with little to no IT. That leaves them with no one skilled to ask and answer these difficult, and sometimes uncomfortable, questions.
Executives, including CFOs, are being forced to play catch up and learn what it means to be disaster-proof. Suddenly, the fine print of a service level agreement (SLA) is becoming the focus of boardroom discussion. How will this service provider handle downtime? What are the mechanisms for transferring our data quickly to another data center? Can you show me how you'd do this?
That last question is the one that might trip folks up. Just because service providers say they can provide failover during an outage doesn't mean they actually can. In IT, it's commonplace to run drills on the data center to see how everyone would perform in a real disaster. With cloud services, we've let this critical task go by the wayside, trusting providers to inherently have this covered.
If the Amazon outage and others like it have proven, the onus is on whoever is responsible for the SLA to ensure that the box for disaster recovery plans can truly be checked off. In many instances, that's now you, CFOs