May 04, 2001, 4:12 PM —
Q:
In April's column you said that CPU usage is inaccurate -- but by
how much, and does it matter?
A: Error is minimal at high usage levels, but ranges up to 80 percent or more at low
levels. The problem is that usage is under reported, and the range of error
increases on faster CPUs.
At a real usage level of 5 percent busy, you'll often see vmstat reporting that
the system is only 1 percent busy -- under reporting by 80 percent
of the true value. You could also look at this as a 400 percent
error in the reported value.
As an example of the kind of problem this can cause, consider a system
planned to cope with a load of up
to 1000 users. If you measure the average process activity of the first
20 users, they only appear to use 1 percent of the system (but in
fact use 5 percent). There appears to be sufficient capacity for 2000 users,
but really there is only enough capacity for 400. As the total
user load increases, and the measurement error reduces, the amount
of CPU used by each user also appears to increase.
I built a tool to measure the errors, collected data on a few
systems, and plotted the results. I would like to get more data, so
the tool has been folded into an updated copy of the process
monitoring update bundle. If you like, you can monitor accuracy on your own
systems and send me the results. I'll start with a more
detailed explanation of the problem, then describe the tool I built,
and show you plots of the initial results.
CPU usage measurements
Normally, CPU time is measured by
sampling, 100 times per second, the state of all CPUs at the clock interrupt.
Process scheduling employs the same clock interrupt used to measure CPU usage,
leading to systematic errors in the sampled data. Microstate accounting,
discussed in April's Performance Q&A, is much more accurate than sampled measurements.
To illustrate how errors occur, I'll excerpt the following example
from April's column:
Consider a performance monitor that wakes up every 10 seconds,
reads some data from the kernel, then prints the results and sleeps. On a fast
system, the total CPU time consumed per wake-up might be a few milliseconds.
On exit from the clock interrupt, the scheduler wakes up processes and kernel
threads that have been sleeping. Processes that sleep consume less than their
allotted CPU time-quanta and always run at the highest timeshare priority.
On a lightly loaded system there is no queue for access to the CPU, so
immediately after the clock interrupt, it's likely that the performance monitor will be
scheduled. If it runs for less than 10 milliseconds it will have completed its task
and be sleeping again by the time the next clock interrupt comes along. Now,
given that CPU time is allocated based on what is running when the clock
interrupt occurs, you can see that the performance monitor could be sneaking a
bite of CPU time whenever the clock interrupt isn't looking.
In the diagram below, a process wakes up, then sleeps twice. The
first wake-up occurs between clock ticks. The period is
interrupted by the subsequent tick, which charges a full 10
milliseconds to the process. The next two wake-ups occur as a result
of the clock interrupt scheduling the process. They complete
before the subsequent interrupt, so there is no charge. The true
measured CPU usage is measured by microstate accounting as 8.3 + 4.6
+ 7.4 = 20.3 ms. The first wake-up is overestimated; the second and
third are missed completely.
CPU usage error checking tool
I've already extended the SE toolkit to include a process class.
This reports the measured CPU usage -- but if microstate accounting is
not enabled for a process, then the value returned is just the same
as the sampled usage. I modified the process class to report sampled
CPU usage as a separate value, and to explicitly set the microstate
accounting flags to enable accurate measurement of every process and
its children.
I used the new programming interface that was introduced in Solaris 2.6; this
tool doesn't work on older releases. In Solaris 2.4 to 2.5.1, microstate data is
obtained by issuing an ioctl call with the PIOCUSAGE flag. This also
automatically turns on microstate data collection. (This interface is
still supported but will go away in a future release.) In Solaris
2.6, I obtain data by reading /proc/pid/usage, which no longer
requires special permissions, but which also no longer turns on
microstate data collection.













