Running a virtual machine over iSCSI SAN? Check your swappiness.

If you run into I/O errors in your VM's, adjusting these settings may help

screenshot rh62 virtual machine 1
Credit: Matias Kreder

Server virtualization is an adventure. There are so many different strategies, so many techniques, and so many gotcha's involved that it can easily consume a giant chunk of your time. The benefits of virtualization are so great however that I'm not sure I'd ever deploy another solo bare metal server again. 

To make your VM's even more flexible, you might consider storing the VM disks on a storage area network (SAN). That way you can gain greater disk resiliency, easier capacity expansion, and greater portability since you can move the VM to another host without moving its storage. If you're like us, you don't have the money for a fibre channel SAN, but iSCSI works remarkably well when set up properly (pdf).  

Depending on your network, you may run into I/O issues from time to time in your virtual machines running over a SAN, especially linux machines. At periods of high activity, the latency of the communication between a VM and the SAN might increase beyond the threshold of the OS. This could be an indicator of a network issue, but assuming you've set everything up properly it could be unavoidable (for the moment) network congestion. The result of this temporary loss of communication between the disk and the host can result in a kernel panic or a pile of I/O errors on the VM like 'rejecting I/O to offline device' until you reboot it. 

To help avoid this, you might consider making two changes to your linux VM operating systems: Decrease the swappiness, and Increase the disk timeout.

Swappiness describes the process of pushing runtime memory back to the disk to free up memory for other operations. If the swapping is too aggressive, it can result in a lot of I/O to the disk as memory is swapped out. Reducing (but not eliminating) the degree of swappiness can reduce I/O considerably. The kernel parameter default is 60 (out of 100). In my experience, reducing the parameter to 10 works out well when we run into I/O issues. To adjust the parameter, open up the file /etc/sysctl.conf (on most distros) and add the following line to the file:

vm.swappiness=10

Next, you may consider increasing the disk timeout threshold. To do this, you need to set an integer value in the file /sys/block/sda/device/timeout

The default value is 30, increasing this value to 180 should be sufficient. To do this, you can't just edit the device timeout file though because that file is overwritten on reboots. To make it persist through reboots, it can be added to the startup file at /etc/rc.local like so:

nano /etc/rc.local

Then enter into the file above exit 0;

echo 180 > /sys/block/sda/device/timeout

That way 180 will be written to the /sys/block/sda/device/timeout file each time the system boots up.

These tips should help with your linux VMs if you're having intermittent issues, but keep an eye out for a more fundamental problem with your setup as well. Oh, and the reason this isn't usually a problem with Windows VMs is that Windows uses a different style of memory management via a pagefile which has its own problems like out of memory errors but doesn't usually suffer from this I/O issue.

ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon