How busy is the CPU, really?
Q:
In April's column you said that CPU usage is inaccurate -- but by
how much, and does it matter?
A: Error is minimal at high usage levels, but ranges up to 80 percent or more at low
levels. The problem is that usage is under reported, and the range of error
increases on faster CPUs.
At a real usage level of 5 percent busy, you'll often see vmstat reporting that
the system is only 1 percent busy -- under reporting by 80 percent
of the true value. You could also look at this as a 400 percent
error in the reported value.
As an example of the kind of problem this can cause, consider a system
planned to cope with a load of up
to 1000 users. If you measure the average process activity of the first
20 users, they only appear to use 1 percent of the system (but in
fact use 5 percent). There appears to be sufficient capacity for 2000 users,
but really there is only enough capacity for 400. As the total
user load increases, and the measurement error reduces, the amount
of CPU used by each user also appears to increase.
I built a tool to measure the errors, collected data on a few
systems, and plotted the results. I would like to get more data, so
the tool has been folded into an updated copy of the process
monitoring update bundle. If you like, you can monitor accuracy on your own
systems and send me the results. I'll start with a more
detailed explanation of the problem, then describe the tool I built,
and show you plots of the initial results.
CPU usage measurements
Normally, CPU time is measured by
sampling, 100 times per second, the state of all CPUs at the clock interrupt.
Process scheduling employs the same clock interrupt used to measure CPU usage,
leading to systematic errors in the sampled data. Microstate accounting,
discussed in April's Performance Q&A, is much more accurate than sampled measurements.
To illustrate how errors occur, I'll excerpt the following example
from April's column:
Consider a performance monitor that wakes up every 10 seconds,
reads some data from the kernel, then prints the results and sleeps. On a fast
system, the total CPU time consumed per wake-up might be a few milliseconds.
On exit from the clock interrupt, the scheduler wakes up processes and kernel
threads that have been sleeping. Processes that sleep consume less than their
allotted CPU time-quanta and always run at the highest timeshare priority.
On a lightly loaded system there is no queue for access to the CPU, so
immediately after the clock interrupt, it's likely that the performance monitor will be
scheduled.Sign up for ITworld's Daily newsletter
Follow ITworld on Twitter @IT_world
Esther Schindler
If the comments are ugly, the code is ugly
claird
SVG a graphics format for 21st century
pasmith
Take Chrome OS for a test spin
Sandra Henry-Stocker
Solaris Tip: Have Your Files Changed Since Installation?
jfruh
Android fragments vs. the iPhone monolith
mikelgan
What Gizmodo missed about the Pro WX Wireless USB disk drive
Where Google Chrome security fails: the password
I heard mention that the Chrome OS will have some sort of encryption available a la bitlocker. If it's possible to encrypt personal data using another password or key, then it may have potential for very secure data.... And Ubuntu has an 'encrypt home directory' option, perhaps google should follow suit.
- Dann
Join the conversation here
Quick, practical advice for IT pros. Made fresh daily.
Want to cash in on your IT savvy? Send your tip to tips@itworld.com. If we post it, we'll send you a $25 Amazon e-gift card.














where better to put for
where better to put for more responsible behavior than in services that in pocket reach: your mobile phone. Nokia is doing just that in an interesting extension oftheir iPhone silicone cases offering nokia silicone case and the tools, community, and a menu nice away.