October 28, 2013, 10:34 AM — When it comes to making virtual server farms easier to manage, the SAN is the great enabler. But when it comes to maximizing virtual server performance, the SAN is the great bottleneck. A single host running dozens of virtual machines (or more) can easily generate enough I/O operations to reduce overall SAN response time, increasing I/O latency and adversely affecting virtual machine performance. Adding more spindles may help in the short term, but that can be very disruptive to the storage infrastructure and does not really take care of the root of the problem: I/O bottlenecks.
One company that is working to improve VM performance by reducing storage I/O latency is PernixData. PernixData FVP is an add-on module for the VMware vSphere hypervisor that creates a cluster of high-speed SSD devices across multiple vSphere hosts. The PernixData "flash cluster" creates a distributed cache for reads and writes to the SAN, accelerating virtual machine I/O without requiring any changes to the VMs or their host datastores. And because the flash cluster is shared among hosts, PernixData FVP fully supports VMware services such as vMotion, DRS, and HA. VMs can continue to move freely from host to host without incurring a cache "miss" penalty.
[ Virtualization showdown: Microsoft Hyper-V 2012 vs. VMware vSphere 5.1 | InfoWorld Test Center reviews VMware vSphere 5.1 and VMware vCenter Operations Manager. | Get virtualization right with InfoWorld's 24-page "Server Virtualization Deep Dive" PDF guide. ]
By consolidating server-side flash into a single shared flash cluster, PernixData FVP leverages many small flash investments into a large I/O improvement. Installation is quick and easy, and it doesn't even require a reboot of the hosts. PernixData FVP comes in SMB and Standard versions. The SMB version is $9,995 per host for up to four hosts and 100 VMs. The standard edition is $7,500 per host with no restrictions on the number of hosts or virtual machines.
Zero to heroHere is the quick takeaway for PernixData FVP: zero changes to the virtual machine environment. There are no changes to the VMs or to the underlying host datastores, and PernixData FVP is transparent to both the VMs and the SAN.
PernixData FVP works with vSphere 5.0 and vSphere 5.1 hosts. It installs into the vSphere kernel via the VMware update utility and accelerates I/O requests between VMs and the SAN. Configuration consists of creating the flash cluster and adding the server-side flash devices to the cluster. You then designate the datastore to be accelerated. This is a bulk operation -- you do it once for the entire VM datastore on the SAN, not for each VM. The VMs residing in the datastore will automatically inherit the attributes of the FVP flash cluster, as will any VMs added to the datastore later. Alternatively, you can add individual VMs to the flash cluster so that only the I/O of chosen VMs is accelerated.
You can create the flash cluster using any PCIe or SSD flash device found on VMware's hardware compatibility list. Best of all, the server-side flash can consist of a heterogeneous mix of devices. You don't have to install the same flash hardware, or even the same capacity, in each host. As of this 1.0 release, FVP only works with block-based network storage, such as iSCSI and Fibre Channel SANs. Support for file-based storage (that is, NFS) will be available in future versions, as will support for additional hypervisors beyond VMware vSphere.
The size of the flash cluster is based on the I/O activity of the running VMs and not the underlying storage footprint. Thus, for a multiterabyte datastore for all of your VMware VMs, you won't have to break the bank and install multigigabyte or multiterabyte flash devices. However, as with any caching system, PernixData FVP will suffer some read misses during VM startup and for a short initial period.
The write stuffThe flash clusters in PernixData FVP work in either write-through or write-back mode. A write-through flash cluster accelerates reads from the SAN but not writes. Because writes are committed to the server-side flash and the SAN simultaneously, write performance is still bound to the write latency of the SAN.
The write-back policy accelerates both reads and writes, with writes committed to the cache first, then copied to the SAN in the background. Keeping cached I/O data safe is essential because there's always a chance that uncommitted data will be lost in the event of a host failure. FVP prevents this by replicating writes to other cache nodes. Write-back mode allows for zero, one, or two replicas to be stored across the cluster to help prevent data loss. You configure this on a per-VM basis, so you can make the cache fault tolerant for some VMs but not others.
The PernixData dashboard provides excellent real-time views into how the flash cluster is performing. This VM IOPS chart shows an effective IOPS of nearly 60,000, with local flash contributing roughly 50,000 and the SAN only about 9,000.
Admins will have no trouble analyzing the health and performance of the flash cluster. The PernixData management console integrates into vCenter Server. Find the Performance section under the PernixData tab in vCenter, and a wide range of customizable charts and graphs are available, including virtual machine IOPS, virtual machine latency, and cache hit rate and eviction.
The hit rate and eviction rate charts are the keys to ensuring the flash cluster is sized correctly for the number of running VMs. Hit rate is a measure of how many I/O operations are being served by the server-side flash as opposed to being served by the SAN. A hit rate of 100% tells us our flash cluster is sized correctly for the running VMs in our environment. A real-world hit rate of 85% is typical and reasonable.
Eviction rate is a measure of how much data is being flushed from the cache to make room for new data. Let's say you have 100GB of server-side flash for two VMs, each with a size of 75GB. Because only 100GB of cache are available and the VMs have a total working size of 150GB, at some point older data will have to be cleared from the cache to accommodate new data. The percentage of data being removed from the cache is the eviction rate. In a perfect world, the eviction rate would be 0%, indicating that the server-side flash was large enough to satisfy all the reads and writes for both VMs.
PernixData FVP is a great product for any VMware server farm bogged down by SAN bottlenecks. It installs cleanly into the hypervisor, works with heterogeneous flash devices, and scales as the vSphere cluster grows. Best of all, it provides true write-back capabilities. And because PernixData FVP doesn't restrict vMotion and other VMware services, there really isn't a scenario where its distributed flash cache doesn't make sense to improve VM and SAN performance.
- Greatly improves both read and write performance of virtual machines stored on a SAN (Fibre Channel, FCoE, or iSCSI)
- Compatible with VMware vMotion, DRS, HA; mobility of VMs is unimpeded
- Works with any server-side flash device on the VMware compatibility list (PCIe and SSDs)
- Easy to install and maintain
- Works only with block-based storage systems, not file-based storage
- Available only for VMware vSphere 5.0 and 5.1
This article, "Review: Flash your way to better VMware performance," was originally published at InfoWorld.com. Follow the latest developments in virtualization, data center, and cloud computing at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.
Read more about virtualization in InfoWorld's Virtualization Channel.