This vendor-written opinion has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.
The powerful storms that recently hit the mid-Atlantic region caused electrical outages that in turn disrupted Amazon's EC2 services. The impact to the consumers and businesses that depend on the Amazon cloud have been well chronicled.
Amazon is a leader in reliability and, if it happens to them, it can and will happen to any service provider. The point is that in order to maintain business continuity, enterprises need to take responsibility for planning their outage contingencies.
HINDSIGHT: Amazon outage one year later: Are we safer?
The problem is that business processes, applications and computing infrastructure are too intertwined and dependent on each other. If the infrastructure isn't configured just right or is unavailable, the business process stops. The industry has made great strides in abstracting the physical computing infrastructure from the applications it supports. Amazon and VMware have created tremendous value and built businesses by abstracting (or insulating) applications and users from hardware diversity and failures.
However, the industry has only started to abstract the business process from the applications and infrastructure that supports it. To work around an Amazon EC2 outage, organizations really need to utilize more than one provider to avoid a single point of failure. Yet in order for the business to be successful at this there needs to be the ability to reroute and rerun the process in their own data center or an alternative service provider. This is where higher-level process automation comes in.
The recent outages at Royal Bank of Scotland, BATS Global Markets and others demonstrate the inability not only to abstract the process from the infrastructure but to see the interdependencies and the failures that plague complex IT systems as well. In those particular outages, it took minutes to fix the problem but days to find it.
Process automation that keeps track of the complex interdependencies between applications, infrastructure and business workflows can help identify, or even predict problems. Then in the case of an unavoidable outage, the business workflows would be rerouted to an available data center.
Most process automation done today is low level IT administrative tasks for provisioning servers, handling backup or startup routines, and generally doing infrastructure tasks that require little decision making that could affect the line of business. This is necessary and important, but not sufficient to preserve the user experience or business process integrity in the face of increasingly complex IT environments where, statistically, something is always failing.
Enterprises must step up their IT process automation to the point that they can manage business workflows not just servers or IT tasks.
If the businesses dependent on Amazon had these capabilities, they would drastically reduce the outages they experienced. Orchestrating business workflows and associated data across applications and infrastructure is easier said than done. However, it can, and is, being done by many enterprises to assure service levels.
Being able to "roll-back" failed system updates to previous working versions, spotting process failures before they create an unrecoverable backlog, and the ability to run a workflow on newly provisioned environments is the type of higher-level process automation that abstracts inevitable outages from the user or business experience.
As enterprises get more serious about higher-level process automation, they will spend more time abstracting their processes from specific infrastructures and application environments. This abstraction is not only key to quickly managing an outage, it's also key to efficiently dealing with the growing IT complexity created by today's hyper-competitive business environment.
Whether IT is ready or not, the business is doing whatever it takes to respond to changing market and customer demands by pushing IT to develop new applications at a faster pace and deploy them quickly (on highly virtualized infrastructure). Add that up and you get a lack of organization, infrastructure sprawl, and more fluidity as to where applications actually run, resulting in IT complexity and skyrocketing application-to-infrastructure dependencies.
It's at this point where the need for process abstraction and automation becomes acute. Because these interdependencies, which represent potential breakage points, are beyond human ability alone to manage. IT organizations are now forced to deal with these new realities while Cloud, Big Data, DevOps and ITaaS pressures get added to the mix in the name of providing more business agility. With all these moving parts, something needs to be stable and act as the IT backbone. It's increasingly obvious that it's the process and process control.
The days of designing the process to accommodate the shortcomings of infrastructure are over. Enterprises must abstract, insulate and protect their business processes from the applications and infrastructures that support them. The need for improved IT process automation is rising as the services and brand impact of on-line outages grows.
UC4 Software is the world's largest independent IT Process Automation software company. UC4 automates tens of millions of operations a day for over 2,000 customers worldwide. Rethink IT automation at www.UC4.com.
Read more about software in Network World's Software section.
This story, "Dealing with cloud outages -- are we ready?" was originally published by Network World.