December 09, 2009, 12:25 PM — Sorting data in system log files certainly can be a chore, but only under certain circumstances. Whenever you have a text file of any kind with uniform delimiters, you can sort the data on any particular column with the sort command.
To sort the /etc/passwd file on the user's full name (the GECOS field), for example, you can do this:
sort -t: +4 /etc/passwd
The -t argument specifies the delimiter used in the file and +4 tells the system to sort on the 5th field (four fields to the right of the default first field). To sort on the UID (third field), you would have to additionally specify that you want a numeric sort.
sort -n +2 -t: /etc/passwd | more
Sorting columns in text files is relatively easy if the files use uniform delimiters, whether colons, commas, white space or some other character not used except as delimiters in the files. Working with more complex files, involving a number of delimiters is a considerably more complex problem.
Take as an example this line from the /var/adm/messages file:
Dec 10 14:07:13 boson xntpd: [ID 798733 daemon.notice] using kernel phase-lock loop 0041
If all we want to do is sort on the message portion of lines such as these, you could do this in a while loop. Note that the read statement in this sample script stuffs everything after the 5th field into $Message.
#!/bin/bash while read Month Day Time System Process Message do echo $Message >> /tmp/msgs$$ done < /var/adm/messages.3 sort /tmp/msgs$$ | uniq -c rm /tmp/msgs$$
We have both white space and colons acting as delimiters in /var/adm/messages as well as a set of square brackets joining several fields together into a single message component. Consider also that every record in the file will not necessarily have the same format. We might also have lines like these:
Dec 10 07:01:33 boson Fault_PC 0x1035ae0 Esynd 0x0094 Dec 10 12:09:56 boson last message repeated 1 time
Obviously, some of the fields in /var/adm/messages are optional. This adds another element of complexity to the task of sorting on the file's various components.
Since the requester mentioned using awk, I have to assume he was attempting to parse his log records. With awk's split and substr commands plus its array functions, you can get around some of the problems I've mentioned, but it would still be a very tricky task to separate the records into logical units.
In my humble opinion, perl would be a better tool to use than awk. With perl's regular expressions, you can denote optional fields such as those showing up as "xntpd:" and "[ID 798733 daemon.notice]" in our sample record above.
Here's a quick stab at a perl script that breaks out the date/time, system name, optional process and alert messages and the message text.