Ensure cloud application resilience the Netflix way

By Bernard Golden, CIO |  Cloud Computing, Amazon Web Services, Netflix

One of the most heated topics in cloud computing today is the Service Level Agreement (SLA). From the highly charged discussion on the subject, you might expect that the primary factor affecting application availability is the willingness (or unwillingness) of a cloud service provider (CSP) to sign up to a rigorous SLA.

In fact, the application itself is biggest factor affecting application availability. That's what all the furor about cloud SLAs is really about-how available are my applications, because that's what's important to me, and an SLA is a somewhat correlated means to that end. More application outages are caused by what's going on in the application than are ever caused by infrastructure failure-and this is becoming even more true because of the increasingly complex nature of applications.

Unlike the simple client-server or even straightforward multi-tiered, single-machine-at-each-tier applications of the past 20 years, today's applications are a complicated méelange of multi-tier, horizontally-scaled instances (that is, virtual machines) containing aggregations of software packages, calling internal and external services, and operating in highly variable load conditions that cause application topologies to constantly shift as new instances join and leave the application. The old model of resilience-"If it's not broke, don't touch it"-just won't work in this environment.

There's No Such Thing as a Stable System

It is the nature of such applications that complex interactions between application components execute thousands of times per second. It's likely that the same execution path for a user interaction may not occur for days at a time, given the state of the user's session, the actions the user takes and the then-current topology of the application.

It might not, in fact, be wrong to say that the same execution path will never be followed a second time, given the shifting nature of the entire application. Compared to this environment, the CSP infrastructure is highly unlikely to be the only, or even primary, cause of application outages.

Analysis: Do Customers Share Blame in Amazon Outages?


Originally published on CIO |  Click here to read the original story.
Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Cloud ComputingWhite Papers & Webcasts

See more White Papers | Webcasts

Answers - Powered by ITworld

ITworld Answers helps you solve problems and share expertise. Ask a question or take a crack at answering the new questions below.

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Ask a Question