3. Test that plan. Then test it again. "The cloud is the perfect place to test failures in a completely staged environment," says Donald Flood, vice president of engineering for Bizo, a business-to-business advertising network provider and Amazon Web Services customer. "You can easily create a staged environment that mirrors production and test your systems by killing running services and evaluating how your system performs under failure."
4. Create internal back-up options. It took about two days for Amazon to locate and repair the problems at its data center in northern Virginia. But as soon as U.S. Tennis Association CIO and Amazon Elastic Cloud Compute (EC2) customer Larry Bonfante began to notice application sluggishness, he and his team migrated the USTA's critical systems to their own server. IT leaders must maintain internal contingency capabilities, Bonfante advises.
5. Reexamine your sourcing strategy. IT leaders have embraced multi-sourcing, but that model can make cloud continuity confusing. "The domino-effect ramifications of an outage are very complex to manage and resolve," says Fersht. For example, as more services get built on top of cloud computing infrastructures, a seemingly isolated outage can have a domino effect, taking down many services or an entire application environment, he adds.
Putting one service integration provider in charge of a multi-sourced arrangement will give you "one throat to choke" in the event of a failure, according to Fersht, but it can also prove problematic. "They are likely to develop an institutional knowledge of your IT processes that would be very tough to transfer in the future if you wanted to maintain a healthy competitive environment," Fersht says. "You need to have your own IT staff get smart about how cloud works, or you really do risk potentially losing control over your own IT environment."
6. Don't be cheap. The ROI of redundancy investments skyrockets in cloud collapse scenarios. Many of the companies affected by Amazon's failure could not-or would not-pay to run parallel systems in the cloud. Major Amazon Web Services customer Netflix, on the other hand, says it experienced no issues because its cloud computing model assumed one of the data centers in Amazon's four regions would go down. The company had "taken full advantage of Amazon Web Services' redundant cloud architecture," a Netflix spokesperson told The New York Times.