How busy is the CPU, really?

Unix Insider |  Networking Add a new comment

Q:
In April's column you said that CPU usage is inaccurate -- but by
how much, and does it matter?


A: Error is minimal at high usage levels, but ranges up to 80 percent or more at low
levels. The problem is that usage is under reported, and the range of error
increases on faster CPUs.
At a real usage level of 5 percent busy, you'll often see vmstat reporting that
the system is only 1 percent busy -- under reporting by 80 percent
of the true value. You could also look at this as a 400 percent
error in the reported value.


As an example of the kind of problem this can cause, consider a system
planned to cope with a load of up
to 1000 users. If you measure the average process activity of the first
20 users, they only appear to use 1 percent of the system (but in
fact use 5 percent). There appears to be sufficient capacity for 2000 users,
but really there is only enough capacity for 400. As the total
user load increases, and the measurement error reduces, the amount
of CPU used by each user also appears to increase.


I built a tool to measure the errors, collected data on a few
systems, and plotted the results. I would like to get more data, so
the tool has been folded into an updated copy of the process
monitoring update bundle. If you like, you can monitor accuracy on your own
systems and send me the results. I'll start with a more
detailed explanation of the problem, then describe the tool I built,
and show you plots of the initial results.


CPU usage measurements

Normally, CPU time is measured by
sampling, 100 times per second, the state of all CPUs at the clock interrupt.
Process scheduling employs the same clock interrupt used to measure CPU usage,
leading to systematic errors in the sampled data. Microstate accounting,
discussed in April's Performance Q&A, is much more accurate than sampled measurements.


To illustrate how errors occur, I'll excerpt the following example
from April's column:

Consider a performance monitor that wakes up every 10 seconds,
reads some data from the kernel, then prints the results and sleeps. On a fast
system, the total CPU time consumed per wake-up might be a few milliseconds.
On exit from the clock interrupt, the scheduler wakes up processes and kernel
threads that have been sleeping. Processes that sleep consume less than their
allotted CPU time-quanta and always run at the highest timeshare priority.



On a lightly loaded system there is no queue for access to the CPU, so
immediately after the clock interrupt, it's likely that the performance monitor will be
scheduled. If it runs for less than 10 milliseconds it will have completed its task
and be sleeping again by the time the next clock interrupt comes along. Now,
given that CPU time is allocated based on what is running when the clock
interrupt occurs, you can see that the performance monitor could be sneaking a
bite of CPU time whenever the clock interrupt isn't looking.



In the diagram below, a process wakes up, then sleeps twice. The
first wake-up occurs between clock ticks. The period is
interrupted by the subsequent tick, which charges a full 10
milliseconds to the process. The next two wake-ups occur as a result
of the clock interrupt scheduling the process. They complete
before the subsequent interrupt, so there is no charge. The true
measured CPU usage is measured by microstate accounting as 8.3 + 4.6
+ 7.4 = 20.3 ms. The first wake-up is overestimated; the second and
third are missed completely.


CPU usage error checking tool

I've already extended the SE toolkit to include a process class.
This reports the measured CPU usage -- but if microstate accounting is
not enabled for a process, then the value returned is just the same
as the sampled usage. I modified the process class to report sampled
CPU usage as a separate value, and to explicitly set the microstate
accounting flags to enable accurate measurement of every process and
its children.


I used the new programming interface that was introduced in Solaris 2.6; this
tool doesn't work on older releases. In Solaris 2.4 to 2.5.1, microstate data is
obtained by issuing an ioctl call with the PIOCUSAGE flag. This also
automatically turns on microstate data collection. (This interface is
still supported but will go away in a future release.) In Solaris
2.6, I obtain data by reading /proc/pid/usage, which no longer
requires special permissions, but which also no longer turns on
microstate data collection.

    Add a comment

    Post a comment using one of these accounts
    Or join now
    At least 6 characters

    Note: Comment will appear soon after you have activated your account.
    Obscene/spam comments will be removed and accounts suspended.
    The information you submit is subject to our Privacy Policy and Terms of Service.

    ITworld LIVE

    NetworkingWhite Papers & Webcasts

    White Paper

    The 2011 iPass Mobile Enterprise Report

    This industry survey covers trends, recommendations and a policy guide on managing Enterprise Mobility for IT management and CIOs. Get data on employee device liability, as well as smartphone/tablet penetration, budget control and provisioning. Find out how your organization compares, how to ensure mobile worker productivity, and control costs.

    Webcast On Demand

    Managing Enterprise Mobility Costs

    Mobile employees, especially those traveling internationally, were spending time and resources finding and making connections. Roaming costs were out of control. The IT Administrator at The Hay Group tells you how he got more control over these costs, providing management with predictable budgets and insights while ensuring employee productivity.

    Sponsor: iPass

    White Paper

    Digital Transformation: Creating New Business Models Where Digital Meets Physical

    Individuals and businesses alike are embracing the digital revolution. Social networks and digital devices are being used to engage government, businesses and civil society, as well as friends and family.

    White Paper

    The Journey to the Private Cloud

    Both business and IT need the agility enabled by the private cloud. Now you can apply technologies and processes pioneered by public cloud services to your own data center.

    Webcast On Demand

    Navigating the Public Cloud

    InfoWorld contributing editor and consultant David Linthicum offers expert advice about choosing services to outsource to the public cloud providers, cloud data security and identity, integrating public cloud services, and how to avoid provider lock-in.

    Sponsor: Intel

    See more White Papers | Webcasts

    Ask a question

    Ask a Question