April 21, 2001, 11:37 PM — Q: Some of my disks get really slow when they are nearly 100 percent busy; however, when I see a striped volume or hardware RAID unit at high utilization levels, it still seems to respond quickly. Why is this? Do the old rules about high utilization still apply?
A: This occurs because more complex systems don't obey the same rules as simple systems when it comes to response time, throughput, and utilization. Even the simple systems aren't so simple. I'll begin our examination of this phenomenon by looking at a single disk, and then move on to combinations.
Part of this answer is based on my September 1997 Performance Q&A column. The information from the column was updated and included in my book as of April 1998, and has been further updated for inclusion in Sun BluePrints for Resource Management. Written by several members of our group at Sun, this book will be published this summer (see Resources for more information on both the book and the column). I've added much more explanation and several examples here.
Measurements on a single disk
In an old-style, single-disk model, the device driver maintains a queue of waiting requests that are serviced one at a time by the disk. The terms utilization, service time, wait time, throughput, and wait queue length have well-defined meanings in this scenario; and, for this sort of basic system, the setup is so simple that a very basic queuing model fits it well.
Over time, disk technology has moved on. Nowadays, a standard disk is SCSI-based and has an embedded controller. The disk drive contains a small microprocessor and about 1 MB of RAM. It can typically handle up to 64 outstanding requests via SCSI tagged-command queuing. The system uses an SCSI host bus adaptor to talk to the disk. In large systems, there is yet another level of intelligence and buffering in a hardware RAID controller. However, the iostat utility is still built around the simple disk model above, and its use of terminology still assumes a single disk that can only handle a single request at a time. In addition, iostat uses the same reporting mechanism for client-side NFS mount points and complex disk volumes set up using Solstice DiskSuite or Veritas Volume Manager.
In the old days, if the device driver sent a request to the disk, the disk would do nothing else until it completed the request. The time this process took was the service time, and the average service time was a physical property of the disk itself. Disks that spun and sought faster had lower (and thus better) service times. With today's systems, if the device driver issues a request, that request is queued internally by the RAID controller and the disk drive, and several more requests can be sent before a response to the first comes back. The service time, as measured by the device driver, varies according to the load level and queue length, and is not directly comparable to the old-style service time of a simple disk drive. The response time is defined as the total waiting time in the queue plus the service time. Unfortunately, as I've mentioned before, iostat reports response time but labels it svc_t. We'll see later how to calculate the actual service time for a disk.
As soon as a device has one request in its internal queue, it becomes busy, and the proportion of the time that it is busy is the utilization. If there is always a request waiting, then the device is 100 percent busy. Because a single disk can only complete one I/O request at a time, it saturates at 100 percent busy. If the device has a large number of requests, and it is intelligent enough to reorder them, it may reduce the average service time and increase the throughput as more load is applied, even though it is already at 100 percent utilization.













