Fixing Apps In the Heat of the Moment Is a Bad Idea
The basic philosophy behind the Simian Army is that waiting to see how your application responds to problems and then depending upon monitoring tools to do forensic analysis while the application is down resolves the issue too late. For applications that a company depends on for service delivery and revenue generation, post-problem mitigation is unacceptable. It's far better to force problem scenarios, evaluate how the application responds, and improve components or operations necessary to maintain operation in the face of resource failure. Netflix tends to do this at low-load period when the problems it causes or extra load it imposes will not overtax the system.
IT is moving toward a world where Netflix-like application design and operational topologies will become the norm. Traditional resilience approaches and mitigation strategies won't work or won't be acceptable in this world. In any event, attempting to address problems in the heat of a downtime situation is a poor approach, since stress inevitably degrades analytical capabilities and solution quality.
Netflix is making a major open source push by releasing its various application tools under a sharing-friendly license. If you're looking to the future, evaluate what Netflix is doing and consider how it can be applied in your environment.
Bernard Golden is the vice president of Enterprise Solutions for enStratus Networks, a cloud management software company. He is the author of three books on virtualization and cloud computing, including Virtualization for Dummies. Follow Bernard Golden on Twitter @bernardgolden.
Read more about cloud computing in CIO's Cloud Computing Drilldown.