Unix How To: Sorting Log Data

By  

Sorting data in system log files certainly can be a chore, but only under certain circumstances. Whenever you have a text file of any kind with uniform delimiters, you can sort the data on any particular column with the sort command.

To sort the /etc/passwd file on the user's full name (the GECOS field), for example, you can do this:

sort -t: +4 /etc/passwd

The -t argument specifies the delimiter used in the file and +4 tells the system to sort on the 5th field (four fields to the right of the default first field). To sort on the UID (third field), you would have to additionally specify that you want a numeric sort.

sort -n +2 -t: /etc/passwd | more

Sorting columns in text files is relatively easy if the files use uniform delimiters, whether colons, commas, white space or some other character not used except as delimiters in the files. Working with more complex files, involving a number of delimiters is a considerably more complex problem.

Take as an example this line from the /var/adm/messages file:

Dec 10 14:07:13 boson xntpd[544]: [ID 798733 daemon.notice] using kernel phase-lock loop 0041

If all we want to do is sort on the message portion of lines such as these, you could do this in a while loop. Note that the read statement in this sample script stuffs everything after the 5th field into $Message.

#!/bin/bash

while read Month Day Time System Process Message
do
    echo $Message >> /tmp/msgs$$
done < /var/adm/messages.3

sort /tmp/msgs$$ | uniq -c
rm /tmp/msgs$$

We have both white space and colons acting as delimiters in /var/adm/messages as well as a set of square brackets joining several fields together into a single message component. Consider also that every record in the file will not necessarily have the same format. We might also have lines like these:

Dec 10 07:01:33 boson       Fault_PC 0x1035ae0 Esynd 0x0094
Dec 10 12:09:56 boson   last message repeated 1 time

Obviously, some of the fields in /var/adm/messages are optional. This adds another element of complexity to the task of sorting on the file's various components.

Since the requester mentioned using awk, I have to assume he was attempting to parse his log records. With awk's split and substr commands plus its array functions, you can get around some of the problems I've mentioned, but it would still be a very tricky task to separate the records into logical units.

In my humble opinion, perl would be a better tool to use than awk. With perl's regular expressions, you can denote optional fields such as those showing up as "xntpd[544]:" and "[ID 798733 daemon.notice]" in our sample record above.

Here's a quick stab at a perl script that breaks out the date/time, system name, optional process and alert messages and the message text.

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Spotlight on ...
Online Training

    Upgrade your skills and earn higher pay

    Readers to share their best tips for maximizing training dollars and getting the most out self-directed learning. Here’s what they said.

     

    Learn more

IT ManagementWhite Papers & Webcasts

See more White Papers | Webcasts

Answers - Powered by ITworld

ITworld Answers helps you solve problems and share expertise. Ask a question or take a crack at answering the new questions below.

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Ask a Question