November 02, 2012, 8:58 PM — Amazon Web Services makes a big deal out of its availability zones, recommending that customers use multiple AZs to build clouds that are tolerant of failures like the one the company recently experienced.
But experts say using a second availability zone when you're starting up an AWS instance isn't a panacea to prevent your app from going down during an outage, nor is it as easy to do as simply flipping a switch. And it will cost you more -- sometimes more than double the cost of deploying to a single AZ.
AMAZON TAKES THE BLAME: Amazon reveals cause of outage, refunds customers
MORE CLOUD: Top 8 cities for cloud computing jobs
There are a variety of tools for customers to create highly available, fault tolerant systems within Amazon's cloud. One option is to use AWS's own services, specifically its Elastic Load Balancers (ELBs) that allow workloads to be moved from one availability zone into another. However, AWS acknowledged in a post-mortem report about the October outage that even its ELBs were impacted.
1. Architectural design
There are a variety of vendors, some call them cloud access management tools, that will manage this process of creating highly available cloud systems using AWS. RightScale and enStratus are two of the most popular. RightScale offers customers prepackaged solutions that spread workloads across AZs, or even various providers. But as Gartner IaaS analyst Kyle Hilgendorf says, "It's a cost vs. risk play. It's not easy, and it's not cheap." Highly available cloud systems simply cost more and add complexity.
Fault-tolerant applications have to be built to be horizontally scalable. In an ideal world they would be stateless, meaning they wouldn't be constantly changing with new data being inputted and saved into them. In most cases, it requires a copy of your system to be made somewhere else to ensure fault tolerance during an outage. Even when all of that is taken into account, there can still be problems.

















