VMware's virtualization solutions promise greater hardware utilization and flexibility. Yet, like all virtualization technologies, they carry a certain amount of risk. Stack too many servers and you achieve significant financial gain but incur the risk of operational incidents and performance problems in your production environment. Ignore the business-related aspects of your production environment that aren't present in the lab and you may put critical apps at great risk. Rely too heavily on some of the advanced automation features of VMWare without the proper planning and you could wind up with bigger problems than those you were trying to solve in the first place.
Proper planning is at the heart of any technology optimization initiative and this applies to VMWare as well. Large scale virtualization necessitates a data-driven approach, carefully evaluating elements such as business considerations, technical constraints, and workload patterns. Things in the VMWare world are very fluid, so it's not only important to achieve an optimal initial placement of virtual machines, but also to understand how to keep the environment optimized..
Virtualization planning can be very complex if not using the proper planning tools, but regardless of the approach, organizations should ensure that they are following some basic guidelines during the process.
Watch for Technical Factors that May Introduce Risk: Be careful when combining servers that have differing configurations, diverse underlying platforms, or varying network/storage connectivity. Combining servers that touch too many networks onto a single physical host can drive up costs through the increase of NICs and PCI extenders (blade racks are particularly sensitive to this). Be sure to uncover any hardware or configurations of interest, such as SAN controllers, token ring cards, IVRs, proprietary daughterboards, direct-connect printers, or other items that are not part of the standard build. This process, called variance analysis, reveals hardware configuration affinities and "outliers", which ultimately helps avoid any interruption of critical business services during the virtualization process.
Consider the Key Business Constraints that Govern the Environment: Consider real-world business constraints, such as availability targets, maintenance windows, application owners, compliance restrictions, disaster recovery strategies, and other business sensitivities. Most small-scale virtualization planning doesn't go beyond simple workload analysis, yet any foray into larger production environments will show that it is very important to dig much deeper. For example, it's not unheard of to combine virtualization candidates based solely on utilization data and end up with a dysfunctional environment where there is not a single time in the calendar when the physical server can actually be shut down for maintenance. Considering the maintenance windows of the applications in the planning phase will avoid such problems, and it is not always wise to rely on Vmotion to get out of a jam. Likewise, mixing different availability levels can either create risk or waste expensive hardware.
Tackle the Political and Financial Ramifications of Sharing Infrastructure: Another issue to consider is the politics of these consolidation decisions. Application owners may have real or perceived reasons why they cannot share infrastructure and these often translate into additional constraints and/or what-if analysis in order to resolve the issues. In addition, most chargeback models aren't sophisticated enough to deal with virtualized infrastructure and will break down if resource sharing crosses certain boundaries. Using affinity regions based on departments and application owners may be a wise decision in cases where political or financial considerations pose a challenge.
Be Exhaustive When it Comes to Workload Patterns and Personalities: Everyone wants to maximize savings, but there is a trade-off between risk and return when virtualizing existing environments. What is acceptable in a lab is usually not the same as what is required in production, and the risk of performance degradation is often a key consideration when determining the target utilization in a virtual environment. It's vital that organizations understand this and properly evaluate workload patterns to determine their own comfort with savings, stacking ratios and operational risk levels. Some of the most important aspects of workload analysis, such as complementary pattern detection and time-shift what-if analysis, are often overlooked when determining if workloads can be combined. Looking at these areas in depth and across all the major CPU, I/O and resource capacity food groups, helps ensure that you've maximized utilization while leaving enough headroom to cushion peak demands on the infrastructure.
Understand the Overhead of Virtualization: When analyzing for VMWare consolidation in particular, you need to look at the overhead created by the virtual machine. Unlike physical servers, VMWare virtual servers create CPU overhead when data is sent to the disk or over the network. Typically organizations build in a fixed-percentage overhead when planning virtual environments, but this approach can sell systems short. The best approach is to properly analyze I/O rates and project a more accurate utilization curve that factors in application workload as well as the true overhead introduced by virtualization.
Analyze Constraints Together, Not in Isolation: Don't plan virtualization based on any one constraint viewed in isolation. It's important to consider all the critical constraints together when choosing targets. Taking a one-dimensional analysis of workload, for example, will not only limit your success, but can cause critical performance, security and compliance issues. Organizations should be taking a multi-dimensional look at the net effect of all of the key constraints applied to the pool of potential resources in order to determine the optimal path to a virtualized infrastructure.
Don't Go Backwards When it Comes to Security and Compliance: Ensure that as machines are virtualized they are not breaching compliance rules. For example, regulations regarding information sharing between divisions within financial services or healthcare operations necessitate that certain applications and databases be kept separate. Keeping systems apart from their disaster recovery counterparts or cluster/replication peers is also critical. In addition, security zones should be maintained unless the organization has a clear mandate to redefine what can cohabitate in an environment and/or on a physical system.
Understand the New Roles Introduced by Virtualization: ESX administrators are a new breed of IT professional, and the fact that they often have access to the disk images of multiple virtual servers tends to give them broad visibility into applications and their data. This sometimes creates a "super super" user role that is unprecedented in many environments, and has the potential to violate regulatory and internal compliance rules. Proper virtualization analysis and planning looks for these vulnerabilities and providing a risk matrix that helps the organization ensure continued compliance.
Don't Abuse VMotion: VMotion is an extremely powerful technology that will undoubtedly revolutionize the way many environments are managed. That said, it is not wise to use it as a crutch and rely on it to compensate for poor planning or inadequate management of an environment. Purposefully creating sub-optimal VM placements with the expectation that you can VMotion your way out of trouble is rarely a good strategy, particularly in production environments. This creates a 'try it and see' culture that encourages people to try out different combinations and assume they can just reverse these out if they don't work.
Lay Explicit Ground Rules for DRS: VMware's Dynamic Resource Scheduler (DRS) automatically motions servers according to workload balancing criteria, and because it is not inherently aware of the technical and business constraints on an environment, this can tend to scramble systems from a technical and business perspective. To combat this effect, DRS supports affinity and anti-affinity rules that are used to identify which systems should be kept together and which should be kept apart. While good in principle, this system is difficult to program without having a proper understanding of the relevant constraints. A convenient byproduct of the constraint-based analysis described above is a complete map of all relevant affinities and anti-affinities in a server cluster, providing rules that eliminate potential conflicts and ensures that security zones, business constraints, compliance issues, disaster recovery and chargeback systems are all respected and that the virtualized infrastructure remains optimized over time.
Model Plenty of "What If" Scenarios: Organizations should test out scenarios leveraging analysis of business, technology and workload constraints to better manage their pool of resources. Virtualization allows capacity to be managed in aggregate-providing the potential to revolutionize capacity planning. This makes it possible for businesses to explore a variety of options for optimizing their environment. What would happen, for example, if I virtualized multiple data centers together? Which servers are good candidates for consolidation and will work best together? What is the difference between putting these servers on blades versus rack-mount systems? Altering pre-conceived notions of which servers should be included in an initiative or adjusting risk levels can reveal new opportunities for savings.
Don't Get "Tunnel Vision" When it Comes to Virtualization: Understand what alternatives exist to virtualization. Any virtualization initiative should be part of an overall optimization program. Organizations need to recognize that virtualization is just one of several strategies that can be put into place. Java applications and J2EE content are already abstracted from their physical environment, and database instances reside in a database server that isolates them from the surrounding infrastructure. Given this, it may not be necessary to virtualize these applications at the operating system level. Utilizing their inherent scaling/clustering strategy may be more effective, both from a technical and a financial perspective.
Virtualization planning is not just a sizing exercise. From a planning and management perspective, virtualization is a multi-faceted challenge that can quickly become political. A methodical and data-driven approach to assessing and planning virtualization opportunities is the best way to drive out risk, positively engage application owners, and ensure that success is achieved beyond the "low hanging fruit". To that end, leveraging multi-dimensional analysis of all critical constraints and carefully planning for the specific technologies and platforms in use is key to assuring the success of virtualization initiatives.