Automating System Comparisons

By  

One of the first questions that enters the mind of someone troubleshooting a system problem is how the system with the problem differs from systems that don't exhibit the problem. Even when systems are very similar with respect to the software they are
running, the patches installed, the particular architecture and applications, they can be different in ways that lead to problems. One system might have more swap space. Another might have a newer library. They might mount different file systems or bind to different name servers. Automating the comparisons that are important in your environment can help pinpoint important differences when you need to resolve a problem in a pinch. In this week's post, we're going to look at a script that makes some initial comparisons between two systems to flush out some of the issues that might provide insights during troubleshooting.

In order to run commands remotely without requiring the user to repeatedly enter a password, we're going to assume that either ssh or rsh commands have been enabled from the system on which the script is being run. The command used can be selected in the sixth line of the script.

#!/bin/bash
#
# compare two systems for important differences

# set script to use ssh or rsh
SH=ssh

The compare script expects to be passed two system names and exits with a usage statement when this is not the case.

if [ $# != 2 ]; then
    echo "USAGE: $0 sys1 sys2"
    exit 1
else
    sys1=$1
    sys2=$2
fi

To verify that our commands are going to run without our output being punctuated by password requests, we check the effectiveness of the selected remote command protocol like this:

# check systems for remote command usage
dt1=`$SH $sys1 date 2>/dev/null` || echo "OOPS: cannot $SH to $sys1"
dt2=`$SH $sys2 date 2>/dev/null` || echo "OOPS: cannot $SH to $sys2"

If either of these commands fails to produce a date (i.e., fails instead), the script will exit:

if [ "$dt1" == "" ] || [ "$dt2" == "" ]; then
    exit
fi

if the script fails at this point, the user will see something like this if using rsh:

# ./compare boson fermion
OOPS: cannot ssh to fermion

If using ssh when password-free access is not allowed, the user will be prompted for a password three times for each system before the script exits.

The first differences the script looks for are differences in the versions and patch levels of the operating systems.

# OS
rev1=`$SH $sys1 "uname -r"`
rev2=`$SH $sys2 "uname -r"`

if [ $rev1 != $rev2 ]; then
    echo "Revisions differ"
    echo "$sys1 $rev1"
    echo "$sys2 $rev2"
    echo
fi
ver1=`$SH $sys1 "uname -v"`
ver2=`$SH $sys2 "uname -v"`
if [ $ver1 != $ver2 ]; then
    echo "Versions differ"
    echo "$sys1 $ver1"
    echo "$sys2 $ver2"
    echo
fi

The output from this portion of the script might look like this:

Revisions differ
boson 5.9
fermion 5.10

Versions differ
boson Generic_112233-11
fermion Generic_127127-11

The next thing we look at is what file systems are mounted from remote systems.

# what is mounted
$SH $sys1 "df -k | grep :" | sort > /tmp/df-k-1-$$
$SH $sys2 "df -k | grep :" | sort > /tmp/df-k-2-$$
diff /tmp/df-k-1-$$ /tmp/df-k-2-$$ | grep : && echo

The output of these commands will highlight file systems that are mounted on only one of the two systems:

&lt quark:/data        173570555 130418936 41415914    76%    /net/quark/data

We then use the prtconf command, which provides information on the system components, including installed memory:

# system components
$SH $sys1 "/usr/sbin/prtconf | grep :" > /tmp/prtconf-1-$$
$SH $sys2 "/usr/sbin/prtconf | grep :" > /tmp/prtconf-2-$$
diff /tmp/prtconf-1-$$ /tmp/prtconf-2-$$ | egrep "&lt|&gt" && echo

We might see output like this, showing that the systems have different amounts of RAM installed:

&lt Memory size: 1024 Megabytes
&lt Memory size: 8192 Megabytes

We also look at swap space.

# swap space
swap1=`$SH $sys1 "/usr/sbin/swap -l" | grep -v swapfile | awk '{print $4,$5}'`
swap2=`$SH $sys1 "/usr/sbin/swap -l" | grep -v swapfile | awk '{print $4,$5}'`
if [ "$swap1" != "$swap2" ]; then
    echo "swapfile blocks free"
    echo $sys1 $swap1
    echo $sys2 $swap2
    echo
fi

The output from a difference in swap space will look something like this:

swapfile blocks free
boson 1052624 946544
fermion 2097392 2097392

Another important factor when comparing systems is performance. To get a very quick but essential statistic on system performance, we compare the 15-minute load measurement:

# compare load
load1=`$SH $sys1 "uptime" | awk '{print $NF}'`
load2=`$SH $sys2 "uptime" | awk '{print $NF}'`
echo "load: $sys1 $load1 vs $sys2 $load2"
echo

The single line of output tells us a lot about how much strain each system is under.

load: boson 2.35 vs fermion 0.01

The last thing we check in this script is who is logged on. While this may not be an important system difference, it's nearly always useful to know how many people are using the system and who they are:

# who is logged in
echo $sys1
$SH $sys1 who | awk '{print $1}' | sort | uniq -c
echo $sys2
$SH $sys2 who | awk '{print $1}' | sort | uniq -c

This gives us a simple list of usernames:

boson
   1 jd
   1 chrissie
   2 donboy
   1 ellie
   3 billh
   1 sandra
   1 bigdoe
   1 jonp
  11 godiva
fermion
   6 godiva

Lastly, we clean up any temporary files that we created during our system comparisons.

# clean up
rm /tmp/df-k-1-$$
rm /tmp/df-k-2-$$
rm /tmp/prtconf-1-$$
rm /tmp/prtconf-2-$$
Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Answers - Powered by ITworld

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Ask a Question
randomness