When you plan a VDI storage system backend, you probably use known average resource utilization numbers to do the math. For example, assume you plan a 1k virtual desktop system, you may use the following (exemplary) equation: if each desktop needs 20 GB disk, and 10 IOPS on the average, then you need a system that can provide net capacity of 20TB and 10k IOPS. Simple, isn’t it? the problem is that average numbers are very deceiving. Most of the time desktops are doing nothing. A lot of nothing. But when they are doing something they can use a lot of resources. The idea about averages is that for a large number of desktops, the resources utilization of the active desktops is balanced with the much larger number of idle desktops. The problem is that this is true only to some extent. Here are several scenarios that break the equation:
Stormy, stormy night (and day): boot, login, logout and anti-virus scan storms
(IOPS) Storms are generated by large number of simultaneous processes that do a lot of IOPS. For example, every morning at work start hour, people tend to boot and login to their desktops. This means that in a relatively short period of time many desktop generates a lot of IOPS, much more than their average IOPS number. similar storms are known for logout storms and anti-virus scans. There are many ways to handle storms: pre-boot/login the desktops before people arrive and stretch the shutdown/logout and antivirus scan periods over relatively large time period. For example, if the working hours are from 8:30 to 19:00, you can start pre-booting the desktops at 06:00, perform the anti-virus scan from 20:00 to 00:00 and shut them down afterwards during the night. Other mechanisms that may ease some of the problem are caching, tiering critical data to SSDs, dedup, etc.
Much more information on this issue and some other related issues can be found in:
Desktop applications are naughty
You may assume that once all desktops are up and running you are fine. You may be wrong. Very common desktop applications may generate temporary IOPS loaded (spikes) that can be even worse than the above storms. For example here is a graph showing IOPS done by some common desktop applications (MS-Office 2007, Firefox 3.5, Adobe reader 9, IExplorer 8 ), measured within a virtualized Win7 desktop running a user simulation load (VMWare’s View Planner load).
- IOPS generated by common desktop applications
If you wonder about the Excel and PowerPoint numbers, no, you are not dreaming, these applications generates spikes of IOPS that can surpass 10K IOPS each! If you ask yourself “How can that be? Why don’t I see such numbers on my system?” the answer is that the storage backend used is a very fast disk array that can provide near RAM-disk performance. All other components are standard ones: the host is dual a Quad core CPU sever with 96 GB RAM running ESX 4.1 software, and is connected to the disk array via a fast storage connectivity (2x8Gb FC).
In other words, the applications in this desktop spike because they can, and (almost) nothing limits them. You may claim that in reality, even if you have a powerful storage, other resources (non storage related) will limit the desktop. This is probably wrong. The shown measurements are done while the ESX host run 64 concurrent desktops running the same load. 64 Win7 desktops are also the hosts limit due to memory and CPU resources. This should resemble a typical enterprise class VDI server.
By the way, if you wonder why desktop applications are implemented in such way, then the answer is simple: desktop applications assume they have their own set of resources (per physical desktop) and therefore they are not designed to share. Furthermore, desktop local disks are so bad handling large number of IOPS that they severely limit even a single application doing many IOPS making it useless for the OS to limit the application even further. This explains why even modern physical desktops may be unresponsive and “sluggish” if they get into an IOPS spike.
Of course an IOPS spike in a physical desktop affects only the same desktop. On VDI systems it can affect the entire system and because nothings attempts to limit the amount of IOPS generated by the desktop applications, this spikes can reach spooky numbers (as it is shown in the graph above). This makes the entire averages math irrelevant.
Note that you can’t solve applications spikes by controlling the applications’ load timing as you can do with (IOPS) storms. You can’t tell people not to save their presentation or to open a PDF in the same time. The people are not aware to such timings even if they would like to help you.
Of course, they are ways to handle such issues, but first you have to be aware of them.
Here are some interesting links on VDI and the IOPS issues:
Data Storage for VDI – Part 1 – A Personal View (and all posts that follows)
Calculate IOPS in a storage array (a good generic post on IOPS issue)
fre IOPS related posts: http://vmtoday.com/tag/iops/