Ensure cloud application resilience the Netflix way

By Bernard Golden, CIO |  Cloud Computing, Amazon Web Services, Netflix

Obviously, the new model of application architectures and topologies means that the traditional solution associated with application resilience-install the application, then don't change it for as long as possible-is no longer workable. In fact, in a fascinating podcast I recently listened to, Richard Cook, an expert in complex systems, claimed that there is no longer such a thing as a stable system. Between system changes, maintenance schedules, operations activities and user interactions, one cannot even apply a model of "stability." The assumption that applications are simple collections of a limited number of components with well-understood execution paths and consistent performance characteristics is no longer tenable.

Just as obviously, the traditional solution for assessing application resiliency, assessing if the application is up and no user is actively complaining, is no longer workable either.

Cloud Application Performance Management Can Only Do So Much

To address this problem, companies typically turn to a class of tools that offer application performance management, or APM. These tools mimic end user interaction to evaluate user experience, perform detailed monitoring of software components, and provide analytics across time to identify trends in application and performance. This approach to resilience might be called "If it's not broke, watch and get ready to fix it."

This is all well and good. But it's not enough. While understanding how the application is operating is helpful for managing typical use patterns, no APM can help you address problems that are going to arise because of the complexity and continuous change associated with today's applications.

Simply put, it's not enough to run the app, attach APM and expect things to go well-or even to expect the problems that will arise to be well-bounded. The application elements that will cause problems are unknown, the triggering events unpredictable. Therefore, the application problems one can expect to see in today's environments require more than waiting and responding when a problem arises.

How-to: Improve Application Performance and Reduce Latency


Originally published on CIO |  Click here to read the original story.
Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Answers - Powered by ITworld

Ask a Question