Unix tip: Using sar for long-term performance analysis

By  Open Source, Linux, Sandra Henry-Stocker

The sar command, native to Solaris and often installed on Linux systems via the sysstat package, is extremely useful in analyzing current and recent system performance. On the other hand, if you would like to view performance on your servers over a long time span (more than a month), you need to take some extra steps to preserve your data.

Since I am both a sysadmin and a climate activist, I am concerned about the power being used by idle servers. And I'm not alone. At the Green Computing Summit that I attended in Washington, DC a little over a week ago, representatives from a number of federal agencies and many energy-conscious companies discussed their concern for both the cost and climatic repercussions of energy wasted on idle systems.

Given these concerns, I have been keeping data on CPU usage for a large collection of servers over a long span of time. These statistics give me an good idea which servers are seriously underused and might be shut down either completely or when not in use and those which I can justify running 7x24.

The first step in this long-term performance data collection is running sar normally on all the servers that I want to monitor. In the default setup, sar collects data during normal business hours and generates a report in the evening (Monday through Friday). Since my interest in long-term analysis was limited to keeping track of CPU usage, I collected only a small portion of each daily report to generate monthly averages.

The script shown below assumes that the sar* (report) files have been gathered in /perf/data/ directories. File dates are preserved during the transfer and files more than one month old are removed to keep the directories current and avoid having to question whether a file named sar06, for example, represents system performance on May 6th or June 6th.

Invoked directly, this script prepares a monthly report on a single server. The report will include the average CPU usage line from the sar report for each of the sar reports created in the prior 30 days. Notice that the report below skips weekends (when performance statistics are not collected).

# ./showUsage boson
boson:          %usr    %sys    %wio    %idle
Apr 30          15      5        1      78
May  1          15      5        1      79
May  2          16      5        1      78
May  5          15      5        1      78
May  6          15      6        4      75
May  7          15      5        1      79
May  8          15      5        1      79
May  9          16      7        2      76
... truncated ...
May 29          15      6        1      75
Average         15      5        1      74

On the first day of each month, the script will also add the final line in this report -- the average of the daily averages, thus the monthly average -- to the system's usage log (in this case, /perf/data/boson/usage). That log will look something like this:

Apr 2008:       2       6       15      74
May 2008:       15      5        1      74

While monthly averages may hide a lot of detail, monthly averages that include only normal working hours can still give you an indication of how busy a system is. The report above gives an entirely different impression than this one:

fermion:        %usr    %sys    %wio    %idle
Apr 30          0       1       0       99
May  1          0       1       0       99
May  2          0       1       0       99
May  5          0       1       0       99
May  6          0       1       0       99
May  7          0       1       0       99
May  8          0       1       0       99
May  9          0       1       0       99
... truncated ...
May 29          0       1       0       99
Average         0       1       0       99

The showUsage script reads the Average line from each sar* report, calculates the average of all such lines and prints the report. It uses a simple case statement to retrieve the previous month for use in updating the monthly log when run on the first day of any month.

#!/bin/ksh

# -----------------------------------------------------------------
# init variables
# -----------------------------------------------------------------
PerfDir="/perf/data"
USR=0
SYS=
WIO=0
IDLE=0
DAYS=0

# -----------------------------------------------------------------
# Require system name
# -----------------------------------------------------------------
if [ $# == 0 ]; then
    echo "Usage: $0 "
    exit 1
fi

# -----------------------------------------------------------------
# Exit if no directory
# -----------------------------------------------------------------
if [ ! -d $PerfDir/$1 ]; then
    print "No data for $1"
    exit 2
fi

# -----------------------------------------------------------------
# go to data directory for particular system
# -----------------------------------------------------------------
cd /perf/data/$1

# -----------------------------------------------------------------
# Add padding for nice columns
# -----------------------------------------------------------------
if [ ${#1} -lt 7 ]; then
    NAME="$1:\t"
else
    NAME="$1:"
fi

# -----------------------------------------------------------------
# Exit if there are no data files
# -----------------------------------------------------------------
if [ `ls /perf/data/$1/sar* 2>/dev/null | wc -l` == 0 ]; then
    echo "$NAME No data"
    exit 1
fi

# -----------------------------------------------------------------
# add heading
# -----------------------------------------------------------------
print "$NAME\t%usr\t%sys\t%wio\t%idle"

# -----------------------------------------------------------------
# process sar files in date order
# -----------------------------------------------------------------
for file in `ls -tr sar*`
do
    if [ -z $file ]; then
	continue
    fi
    ls -l $file | read x x x x x MO DAY etc
    if [ ${#DAY} == 1 ]; then
	DAY=" $DAY"
    fi
    grep Average $file | head -1 | read x usr sys wio idle || continue
    print "$MO $DAY\t\t$usr\t$sys\t$wio\t$idle"
    # accumulate totals
    USR=$(($USR + $usr))
    SYS=$(($SYS + $sys))
    WIO=$(($WIO + $wio))
    IDLE=$(($IDLE + $idle))
    DAYS=$(($DAYS + 1))
done

# compute averages
UAVG=$(($USR / $DAYS))
SAVG=$(($SYS / $DAYS))
WAVG=$(($WIO / $DAYS))
IAVG=$(($IDLE / $DAYS))

# print averages
print "Average\t\t$UAVG\t$SAVG\t$WAVG\t$IAVG"

# If 1st of month, add usage for previous month to usage log
date | read x MO DAY time tz YR
if [ $DAY == 1 ]; then
    case $MO in
        Jan) PREV="Dec";;
        Feb) PREV="Jan";;
        Mar) PREV="Feb";;
        Apr) PREV="Mar";;
        May) PREV="Apr";;
        Jun) PREV="May";;
        Jul) PREV="Jun";;
        Aug) PREV="Jul";;
        Sep) PREV="Aug";;
        Oct) PREV="Sep";;
        Nov) PREV="Oct";;
        Dec) PREV="Nov";;
    esac
    # add previous month's usage to system-specific file
    print "$PREV $YR: \t$UAVG\t$SAVG\t$WAVG\t$IAVG" >> /perf/data/$1/usage
fi

If I want to see the previous month's usage for all of the systems that I am collecting data for, I can run a script like this one:

#!/bin/ksh

print "NAME\t\t\t%usr\t%sys\t%wio\t%idle"

for SYS in `ls /perf/data`
do
    if [ ${#SYS} -lt 6 ]; then
	print -n "$SYS\t"
    else
	print -n "$SYS"
    fi
    /perf/bin/showUsage $SYS | tail -1 | sed "s/Average//"
done

The "${#SYS} -lt 6" line checks the length of the system name and adds a tab if it's shorter than 6 characters. This makes the report line up in nice columns when I email it to myself. I run this script via cron once a month since I'm generally not interested in looking at performance data more often than that, though I can run it any time to see if my servers are being heavily or lightly used. The likely low usage during non-work hours is another issue altogether.

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Answers - Powered by ITworld

ITworld Answers helps you solve problems and share expertise. Ask a question or take a crack at answering the new questions below.

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Ask a Question