So, even with live migration and a SAN backend it's not as if you're getting complete immunity from downtime. Yes, hardware issues on your host server are eliminated from the downtime equation, but this improvement in uptime comes at a premium. SANs are costly when purchased from the category leaders. There are smaller companies offering more economic solutions, but regardless of the vendor, this class of storage is considerably more expensive when compared to distributed or local storage.
Distributed storage can be defined as self-contained islands of local storage associated with multiple servers, such that each local store remains accessible only to the server it is attached to, but where the aggregate capacity of all such islands can be used to store the total set of VMs required.
For example, if the servers are implementing a collection of hypervisors for VDI, then individual user VMs may find themselves mapped to specific local stores. The aggregate of all local stores is the distributed storage capacity. The big advantages are setup is simple, no SAN or specialized storage management tools are necessary, the server configuration is fairly simple to order and support and the cost per terabyte is much lower than for SAN based centralized storage solutions.
The downside is uptime. If the hypervisor running your VM crashes, then you cannot move that VM to a different hypervisor using live migration and the end user will experience some downtime. But in order to truly understand the real world implications of this kind of failure, we need to consider two cases under which this situation may occur.
First, that an intermittent issue occurs at the failing server, requiring a reboot or a process restart. In this event, a few minutes of downtime waiting for the server to come back online is all the end user will have to deal with; not completely unlike rebooting a conventional PC. The chances that this sort of issue will happen on a server, though, are usually far less than on a PC as most all servers use higher-grade components, error correcting memory, redundant network interfaces and so on.
The second and more problematic scenario is when the server crashes entirely and cannot be recovered via a reboot. In this event, the user's VM is now stuck on a local store that is no longer accessible. Clearly, this is a troublesome outcome because, in order to restore the user's session, the entire VM needs to be re-created and access to the user's data needs to be restored too. But as with all things in IT, one can't rush to conclusions. It turns out you can architect things in a way that makes even this scenario not quite as disastrous as it appears at first blush.