February 23, 2009, 11:24 AM — Let's talk about managing the cloud. Beware, this argument gets heated.
In this series, I've been discussing the common concerns and objections to cloud computing. I started this due to an interesting phenomenon that I noted: in conversations with a number of people from large technology companies that had announced cloud computing initiatives (and these people were actually involved in those initiatives), a number of them intimated that cloud computing isn't ready for enterprises, or that enterprises aren't ready for cloud computing. Intrigued, I decided to probe further into why they thought there were adoption barriers and eventually identified five key issues. In Part 1, I addressed the inability to conveniently migrate existing applications into cloud infrastructures. In Part 2, I tackled legal, regulatory, and business risk. Part 3 discussed concerns about cloud SLAs (or lack thereof). And in Part 4, I explored concerns regarding cloud TCO. In this posting, I will discuss the final objection brought up regarding cloud computing: the lack of sophisticated system management tools.
As stated by one person, enterprises have existing system management tools in place and the new cloud providers don't integrate with them, therefore you can't manage a cloud system. I think that's a bit harsh. All of the cloud providers offer tools to manage systems running in their environments. And there are startups that provide even more sophisticated tools to manage some cloud environments. So, for example, in the case of Amazon EC2, Amazon offers an Ajax-powered web interface for its cloud environment. There is also a very useful Firefox plug-in called Electric Fox. In case those tools are insufficient, companies like RightScale offer systems that provide more fine-grained granularity than that available in the free tools.
Notwithstanding the existence of these cloud system management tools, an issue remains regarding this topic: the inability of the dominant system management tools to manage a mixed environment that incorporates existing data centers as well as an external cloud environment. In short, end users are faced with using two different system management tools, with use of the cloud tool unable to leverage the human capital represented in employee training and experience. And, if there's any truism in enterprise IT, IT groups don't like having to carry two of anything, if they can help it.
However, is the specific case of cloud computing one in which the challenge of carrying two management systems is likely to prove a dealbreaker? After all, as VMware came into data centers, IT groups had to adapt to using two management systems-one for managing VMware systems and one for managing everything else. And, in some sense, that situation was even worse than the internal/cloud split-because IT groups had to use two different management systems on a single virtualized device-one system to manage the hardware and a second to manage the hypervisor and virtualized guests running on that hardware. And IT groups seemed to manage-because the benefit of applying virtualization outweighed the inconvenience of using two management tools.
And this highlights one of issues relating to system management tools: they're really more effective at managing hardware devices than software systems. They may manage operating systems as well as hardware, but they typically aren't nearly as effective for applications, especially multi-tier applications that span more than one box. This explains the rise of products like Splunk, which offers the ability to correlate log files across systems and software components to enable root cause analysis of system failures. Splunk only exists because the incumbent management tools don't really address these types of application management needs.
Regarding the plans of existing system management tool providers to address cloud computing, to this point, the incumbent system management providers approach to cloud computing is to emphasize how well they will help to manage internal clouds. In this view of cloud computing, enterprises will focus on making their own data centers more "cloud-like." This refers to making IT operations more agile: quicker provisioning of hardware via a web interface; use of orchestration software to make it possible to provision an entire system infrastructure, including multiple pieces of hardware as well as software components through a single transaction rather than repetitive steps; and (perhaps) sophisticated chargeback billing based on usage.
The attractiveness of this approach is that it plays to the incumbents' strengths-managing hardware within a controlled environment, sold to individual enterprises to manage a single environment. Helping to manage an internal cloud represents a logical, more easily achieved incremental functionality that aligns with current market realities. I will be addressing the topic of internal clouds in an upcoming post, as I believe the topic deserves a lot of inspection, but the question at hand is the system management practices for external clouds.
Turning to external cloud management, using existing system management tools as the yardstick may be quite inappropriate. They shine in managing hardware-and in cloud environments, the end user has no hardware to manage. So criticizing external clouds because they can't be managed by incumbent system management tools misses the mark. The cloud providers take care of managing the hardware-and given the scale they work at, they have very sophisticated systems they've purpose-built for their environments. In fact, most of them approach the whole question of hardware very differently than today's system management tools assume.
The current crop of system management tools were developed at a time when hardware was very expensive and difficult to replace. Today's enterprise data centers continue those assumptions and practices, even though the cost of hardware has plummeted-because even if the hardware is cheap, the total cost of an individual piece of hardware can be very expensive-the overhead of additional data center space and manual operations imposes costs that cause the overall cost of managing even cheap hardware to be high. Consequently, even though system management tools are rooted in a world of expensive hardware that must be watched closely because a hardware failure causes application outage, their use still makes sense in data center environments because the constraints of the environments still keep hardware instances expensive.
External clouds (clouds based in external facilities provided by the new breed of provider) are designed based on very different system assumptions. Because hardware is so cheap, cloud providers' system management practices do not focus on nursing individual pieces of hardware as though they are precious. Instead, hardware is assumed to be inherently frail and subject to failure. And at the scale these providers operate, hardware failure is a common occurrence. For example with hundreds of thousands of disk drives present, failed drives are an everyday occurrence. Motherboard failures that cause an entire machine to go bad happen all the time. At that frequency level of failure, rather than try and preserve individual pieces of hardware, or create specialized environments to preserve a small redundant set of hardware (e.g., SAN environments that can manage dozens or hundreds of drives, but are economically unsustainable at thousands of drive's worth of data), cloud providers build system architectures which are inherently redundant and can tolerate individual hardware failure without consequence.
In this environment, existing system management tools are not appropriate-for the provider or user. So criticizing clouds because you can't use a common system management tools misses the point.
The real issue is whether cloud computing is analogous to virtualization in terms of its benefits. Does cloud computing provide enough benefit to make the inconvenience of using two system management tools worthwhile? With respect to virtualization, the benefits were so obvious and so sizable that IT organizations eagerly embraced it, despite its inconveniences. But what about cloud computing? Will IT groups embrace it despite its inconveniences, the subject of this series of postings? That question turns on the potential TCO of cloud computing, and as I have noted, the case against cloud computing TCO is not nearly as bleak as some people assume. (That's another topic I'll be discussing in the future, as I think it deserves further discussion.)
However, even people who denigrate cloud computing recognize that there are certain use cases in which it shines-transient workload application that do not require significant integration with existing applications. The New York Times EC2 application is a poster child for this use of cloud computing. Given the financial benefit of leveraging EC2 for applications like these, I don't think there's any doubt that IT organizations will accept the pain of using a second system management system. While inconvenient, I predict a future for IT groups that mandates multiple system management tools.
With this, we've come to the last of the main objections to cloud computing. In next week's conclusion, I'll wrap up the series and discuss how I think IT groups will address these objections going forward. I think it's safe to say, however, that the answer is likely to be figuring out how to meet the objections and mitigate them, rather than rejecting the use of cloud computing based on their existence.