Meanwhile, the company's cloud-based relational database service suffered from the EBS volume being out and another software bug. For customers who had their RDS in the impacted AZ, those services had to wait for the EBS to be restored, which for most customers was by 11 p.m. PDT. For customers who have their RDS spread across multiple AZs, AWS says there was a software bug that did not allow automatic failover to the unaffected AZs for some customers. AWS says it's known about the bug since April and it has a mitigation for it, which is in beta and will be rolled out in the coming weeks.
While only a single-digit percentage of customers were impacted by the outage, the scope of AWS's customer base meant the situation impacted a large number of users. Customers such as Netflix, Instagram and Pinterest were among those impacted, including during prime-time Pacific Coast movie watching time for Netflix, which was partially down for portions between 8 and 11 p.m. PDT on Friday.
Netflix Cloud Architect Adrian Cockcroft, who has in the past praised AWS for powering the company's operations, filed somewhat of a play-by-play of the outage via his Twitter feed on Friday night and into Saturday. The company, he says, has architected to AWS's specifications and using multiple AZs. That didn't seem to work on Friday though. On Saturday, Cockcroft tweeted, "We only lost hardware in one zone, we replicate data over three. Problem was traffic routing was broken across all zones."
Shahin Pirooz, CTO and CSO of cloud provider CenterBeam, says AWS certainly shares some blame in this outage. "It seems like they had a house of cards that went down on them," he says. Pirooz says he's surprised so many systems went down at once for AWS. "Amazon failed, their failover systems failed, AWS does own some responsibility in this," he says.
One way to prevent this type of circumstance in the future, he says, is to leverage load balancer, domain name systems and disaster recovery offerings from third parties that are not AWS. A variety of companies offer such services, including New Start Systems, Akamai and DynDNS. The "nirvana" situation, he says, would be giving customers the ability to federate services across multiple public cloud providers. That, he predicts, is still five to 10 years away, though, because industry providers do not yet have common agreed-upon supportable migration standards.
OpenStack is attempting to create that with its project, but open source competitors like Citrix's Apache CloudStack are coalescing around AWS as being the de facto standard.