3 tips for making highly available systems in Amazon's cloud

By Brandon Butler, Network World |  Cloud Computing, Amazon Web Services

RightScale CTO Thorsten von Eicken says during the latest Amazon outage, internal operations within RightScale had trouble scaling across availability zones in Amazon's cloud. AWS admitted it was "throttling" customers, meaning it limited how much data they could transfer from one AZ to another, something it has vowed it will not be as aggressive doing in the future. The point is that even if a system is architected to be fault tolerant, unexpected problems can still arise.

There are multiple ways to architect fault tolerant systems though, von Eicken says. Customers can create two active-active services, or create one active and a "clone" standby, for example. Each has its own advantages and cost considerations, though.

Basic fault tolerance: In a basic fault tolerant architecture, there is a production architecture and a standby "clone architecture." If there is a fail in the master AZ, then the system can be manually switched to use the cloned version, a process that not only usually requires a manual switch-over, but the databases are usually replicated in Amazon's Simple Storage Service (S3) about every 10 minutes, so when a switch-over does occur, you could lose about the last 10 minutes worth of data, RightScale says.


Advanced fault tolerant system: A more advanced system creates two active systems running simultaneously. In this active-active setup, any instance, or even an entire AZ can fail and the system will automatically be able to complete all its functions from another AZ that is pre-architected and ready to run on. RightScale says this architecture will cost more than double the cost of a single AZ setup, because all of the services form the single AZ not only have to be replicated, but there are data transfer costs that come with ensuring both systems are kept up-to-date in real time.


There are other options, too.

2. Application design

Sean Hull is an independent scalability and performance consultant with iHeavy in New York, and shortly after the AWS outage authored a blog post titled "AirBNB didn't have to fail," referring to the travel site that was one of dozens across the Internet that went down when AWS's cloud hiccupped. In the post, Hull argues there are tools Web developers can use to be tolerant against outages.

Originally published on Network World |  Click here to read the original story.
Join us:






Spotlight on ...
Online Training

    Upgrade your skills and earn higher pay

    Readers to share their best tips for maximizing training dollars and getting the most out self-directed learning. Here’s what they said.


    Learn more

Cloud ComputingWhite Papers & Webcasts

See more White Papers | Webcasts

Answers - Powered by ITworld

ITworld Answers helps you solve problems and share expertise. Ask a question or take a crack at answering the new questions below.

Ask a Question