Unix: Sorting information that isn't quite numeric

Sorting data numerically and alphanumerically isn't generally much of a challenge on Unix systems, but sometimes .5 is smaller than .11.


Sorting numbers is extremely simple on Unix systems; just use the -n option with your sort commands. But when the numeric information that we want to sort is broken up into chunks of uneven size -- as it generally is with IP addresses and section numbers within documents, we need something more than just -n. An address like, for example, will show up earlier in our sorted output than just as Section 3.5 in a document precedes 3.12. So, what do we do?

To begin, we need to prepare ourselves for using a more detailed sort command. One of the key options we will need is sort's -t. It works like the -F option of awk, allowing us to specify that our field specifier should be a dot and, thus, allowing us to address each field in our addresses separately. Starting our sort command with sort -t . will allow each IP address (IPv4 anyway) to be sorted with respect to each of its four octets (i.e., each byte in the address).

Before we get into the command for sorting IP addresses, however, let's first look at some simpler examples of sort commands. In this first display, we contrast an alphanumeric sort with a numeric sort. The results are quite different, of course, but the numeric sort works just like we'd expect. The resultant list is clearly in numeric order.

$ cat nums	$ sort nums	$ sort -n nums
98.4		1		1
98.15		100.9		7
98.9		11		11
100.9		21		21
67		7		67
21		67		98.15
11		98.15		98.4
7		98.4		98.9
1		98.9		100.9

The problem with IP addresses is that .15 isn't smaller than .9 and so a numeric sorting of fields (unless they're padded with zeroes) isn't going to work. Another option that sort provides, however, is a -k option that allows us to sort on a particular portion of our numeric data. If we want, for example, to sort a series of phone numbers on just the last four digits, we could use a command like this:

$ sort -t - -k3 phones

This command tells sort to sort the data on the third field using a hyphen as the field separator. So, the numbers get sorted on just the last 4 digits, often referred to as "the extension". Phone numbers aren't much of a challenge because they all have the same number of digits (ignoring the possibility of there being both local and international numbers in the list). Sorting numerically on the same-length numeric fields does just what we'd expect.

$ sort -n phones

IP addresses, on the other hand, can have anywhere from one to three digits in any field so, if we want to see 3 in the resulting list before we see 11 in the same octet, we have to work a little harder. Just specifying that our field separator is a dot doesn't quite cut it. Notice here how follows in the list.

$ sort -t .
Photo Credit: 

flickr /Key Foster

Join us:






Operating SystemsWhite Papers & Webcasts

See more White Papers | Webcasts

Answers - Powered by ITworld

Ask a Question