April 28, 2011, 11:48 AM — After a high-profile disaster like a days-long service outage that lost a lot of customers a lot of money and permanently lost data for others, most companies might shorten-up the technology vision and focus on fixing short-term problems.
Not Amazon CEO Jeff Bezos. In an opinion piece in today's San Francisco Chronicle, Bezos lays out his case for advanced IT R&D in a purely commercial organization.
The piece is mainly a piece of firefighting – an attempt to defend to shareholders Amazon's spending decisions following a quarterly earnings report that showed revenue went up 38 percent compared to a year ago, but earnings went down by 33 percent.
That makes for an uuugly financial report and as much disgust among financial geeks as the Amazon EC2 outage caused among technical ones.
Bezos didn't address the data outage, but part of it did give his perception and macro-view of the architecture of both Amazon's internal systems and its public cloud service, including the storage and database services that caused the outage:
>"State management is the heart of any system that needs to grow to very large size. Many years ago, Amazon’s requirements reached a point where many of our systems could no longer be served by any commercial solution: our key data services store many petabytes of data and handle millions of requests per second.
To meet these demanding and unusual requirements, we’ve developed several alternative, purpose-built persistence solutions, including our own key-value store and single table store.
To do so, we’ve leaned heavily on the core principles from the distributed systems and database research communities and invented from there.
The storage systems we’ve pioneered demonstrate extreme scalability while maintaining tight control over performance, availability, and cost.
To achieve their ultra-scale properties these systems take a novel approach to data update management: by relaxing the synchronization requirements of updates that need to be disseminated to large numbers of replicas, these systems are able to survive under the harshest performance and availability conditions.