Four tips to prepare for the next Amazon outage

By Brandon Butler, Network World |  Cloud Computing, Amazon, Amazon Web Services

It happened again: Amazon
Web Service suffered its latest outage in late June. Now, as the dust has settled, customers are reassessing what
lessons can be learned and how to prepare for the inevitable next one.

Compared to AWS's major outage last
summer
, which was caused by human error and resulted in an overloaded network, the most recent incident
resulted from an electrical storm causing a power outage in an AWS Virginia data center. While the actual outage
only lasted about 20 minutes, the domino effect of a backup generator not kicking in, combined with software bugs
AWS had not seen before, caused about 7% of customers in the impacted area to be down, some for as much as three
hours on the evening of Friday, June 29.

READ: Amazon takes blame for
Amazon takes blame for outages, bugs and bottlenecks

COMPETITION HEATING UP: Amazon in the crosshairs of
Google and Microsoft

As storms ripped through the mid-Atlantic coast that Friday night and into Saturday morning, parts of sites such
as Netflix, Pinterest and Instagram were down, sometimes for as much as three hours. But it didn't have to be that
way. Software startup Newvem tracks AWS customer usage, and officials say misconfigurations by customers
exacerbated the problem on that Friday night. Newvem and Netflix have four suggestions of how the latest outage
could have been mitigated and how to prepare for future incidents.

1: Use snapshots

Backing up data is critically important to ensure high availability and AWS gives customers the option of
backing up their Elastic Block Store (EBS), which is a file storage
service impacted during the latest outage, with a "snapshot." EBS Snapshots make a copy of the EBS volume and back
it up in Amazon's accompanying Simple Storage Service (S3) offering. User to have to initially back up their entire
EBS volume to S3, but then whenever there is a change to the content of the EBS volume, only the new data has to be
captured in another snapshot for the volume to be recreated. Of Newvem's more than 500 customers, 45% of users who
have large AWS clouds, meaning those with more than 101 instances, did not have effective EBS snapshots.

2. Ensure correct ELB configurations


Originally published on Network World |  Click here to read the original story.
Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Answers - Powered by ITworld

ITworld Answers helps you solve problems and share expertise. Ask a question or take a crack at answering the new questions below.

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Ask a Question