How busy is the CPU, really?
Q:
In April's column you said that CPU usage is inaccurate -- but by
how much, and does it matter?
A: Error is minimal at high usage levels, but ranges up to 80 percent or more at low
levels. The problem is that usage is under reported, and the range of error
increases on faster CPUs.
At a real usage level of 5 percent busy, you'll often see vmstat reporting that
the system is only 1 percent busy -- under reporting by 80 percent
of the true value. You could also look at this as a 400 percent
error in the reported value.
As an example of the kind of problem this can cause, consider a system
planned to cope with a load of up
to 1000 users. If you measure the average process activity of the first
20 users, they only appear to use 1 percent of the system (but in
fact use 5 percent). There appears to be sufficient capacity for 2000 users,
but really there is only enough capacity for 400. As the total
user load increases, and the measurement error reduces, the amount
of CPU used by each user also appears to increase.
I built a tool to measure the errors, collected data on a few
systems, and plotted the results. I would like to get more data, so
the tool has been folded into an updated copy of the process
monitoring update bundle. If you like, you can monitor accuracy on your own
systems and send me the results. I'll start with a more
detailed explanation of the problem, then describe the tool I built,
and show you plots of the initial results.
CPU usage measurements
Normally, CPU time is measured by
sampling, 100 times per second, the state of all CPUs at the clock interrupt.
Process scheduling employs the same clock interrupt used to measure CPU usage,
leading to systematic errors in the sampled data. Microstate accounting,
discussed in April's Performance Q&A, is much more accurate than sampled measurements.
To illustrate how errors occur, I'll excerpt the following example
from April's column:
Consider a performance monitor that wakes up every 10 seconds,
reads some data from the kernel, then prints the results and sleeps. On a fast
system, the total CPU time consumed per wake-up might be a few milliseconds.
On exit from the clock interrupt, the scheduler wakes up processes and kernel
threads that have been sleeping. Processes that sleep consume less than their
allotted CPU time-quanta and always run at the highest timeshare priority.
On a lightly loaded system there is no queue for access to the CPU, so
immediately after the clock interrupt, it's likely that the performance monitor will be
scheduled.Sign up for ITworld's Daily newsletter
Follow ITworld on Twitter @IT_world
jfruh
Apple syncing patent can't come soon enough
pasmith
New Twitter features borrow from 3rd party clients
Esther Schindler
Open Source Changes the Software Acquisition Process
mikelgan
How to set up continuous podcast play on the new iTunes
David Strom
Five important Windows 7 mobility features
sjvn
Guard your Wi-Fi for your own sake
Sandra Henry-Stocker
Grepping on Whole Words
Sidekick: The Good News & the Bad News
Either way you look at it Microsoft Data Center management did not follow standards or best practices in this failure. In which case it makes me wonder more about the outsourcing of corporate data much less personal data.
- mburton325
Join the conversation here
Quick, practical advice for IT pros. Made fresh daily.
Want to cash in on your IT savvy? Send your tip to tips@itworld.com. If we post it, we'll send you a $25 Amazon e-gift card.













where better to put for
where better to put for more responsible behavior than in services that in pocket reach: your mobile phone. Nokia is doing just that in an interesting extension oftheir iPhone silicone cases offering nokia silicone case and the tools, community, and a menu nice away.