Unix: Sorting information that isn't quite numeric

Sorting data numerically and alphanumerically isn't generally much of a challenge on Unix systems, but sometimes .5 is smaller than .11.

Sorting numbers is extremely simple on Unix systems; just use the -n option with your sort commands. But when the numeric information that we want to sort is broken up into chunks of uneven size -- as it generally is with IP addresses and section numbers within documents, we need something more than just -n. An address like 10.3.45.7, for example, will show up earlier in our sorted output than 10.3.7.11 just as Section 3.5 in a document precedes 3.12. So, what do we do? To begin, we need to prepare ourselves for using a more detailed sort command. One of the key options we will need is sort's -t. It works like the -F option of awk, allowing us to specify that our field specifier should be a dot and, thus, allowing us to address each field in our addresses separately. Starting our sort command with sort -t . will allow each IP address (IPv4 anyway) to be sorted with respect to each of its four octets (i.e., each byte in the address). Before we get into the command for sorting IP addresses, however, let's first look at some simpler examples of sort commands. In this first display, we contrast an alphanumeric sort with a numeric sort. The results are quite different, of course, but the numeric sort works just like we'd expect. The resultant list is clearly in numeric order.

$ cat nums	$ sort nums	$ sort -n nums
98.4		1		1
98.15		100.9		7
98.9		11		11
100.9		21		21
67		7		67
21		67		98.15
11		98.15		98.4
7		98.4		98.9
1		98.9		100.9

The problem with IP addresses is that .15 isn't smaller than .9 and so a numeric sorting of fields (unless they're padded with zeroes) isn't going to work. Another option that sort provides, however, is a -k option that allows us to sort on a particular portion of our numeric data. If we want, for example, to sort a series of phone numbers on just the last four digits, we could use a command like this:

$ sort -t - -k3 phones
410-290-1225
949-987-1234
301-945-1264
410-290-6543
640-465-9681

This command tells sort to sort the data on the third field using a hyphen as the field separator. So, the numbers get sorted on just the last 4 digits, often referred to as "the extension". Phone numbers aren't much of a challenge because they all have the same number of digits (ignoring the possibility of there being both local and international numbers in the list). Sorting numerically on the same-length numeric fields does just what we'd expect.

$ sort -n phones
301-945-1264
410-290-1225
410-290-6543
640-465-9681
949-987-1234

IP addresses, on the other hand, can have anywhere from one to three digits in any field so, if we want to see 3 in the resulting list before we see 11 in the same octet, we have to work a little harder. Just specifying that our field separator is a dot doesn't quite cut it. Notice here how 10.3.7.11 follows 10.3.45.7 in the list.

$ sort -t . IPs
10.1.12.98
10.2.99.21
10.3.45.67
10.3.45.7
10.3.7.11
192.168.0.1

Throwing -n into the mix doesn't help either. 10.3.7.11 falls after 10.3.45.7 because .7 is larger than .4.

$ sort -n -t . IPs
10.1.12.98
10.2.99.21
10.3.45.67
10.3.45.7
10.3.7.11
192.168.0.1

If we want to sort just on the rightmost field in a set of IP addresses, we could do this by instructing sort to use just the 4th field:

$ sort -n -t . -k 4 IPs
192.168.0.1
10.3.45.7
10.3.7.11
10.2.99.21
10.3.45.67
10.1.12.98

Here, we see that the sort command is sorting on the 4th octet properly with 7 showing up in the list before 11 and so on. This demonstrates what we need to do on the addresses as a whole -- looking at the contents of each field separately and ignoring the dots that separate them. To sort IP addresses on all four fields, each of the four fields needs to be specified in the sort command. Either of these commands should do the trick:

$ sort -n -t . -k 1,1 -k 2,2 -k 3,3 -k 4,4 IPs
$ sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n IPs

The -n is either applied to the entire command (version 1) or to each field individually (version 2). The "1,1", "2,2" etc. partS of these commands specifY the order in which the fields are sorted. As shown, the first field is sorted first, the second next, etc. And, of course, we could use this kind of sort command in a pipe as well:

$ cat IPs | sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n
10.1.12.98
10.2.99.21
10.3.7.11
10.3.45.7
10.3.45.67
192.168.0.1

The sort command has some additional options as well. Some that I have found quite useful are. -r reverse the order of the sort -b ignore leading blanks (as far as the sort is concerned, it will not remove them) -c check and report on whether the input is in sort order -M sort in month order where months are Jan, Feb, etc. -m merge files before sorting their joined content -u remove duplicates Here are some examples of commands using these options: Sort dates in month order.

$ sort -M dates
Jan 4
Jan 8
Feb 2
Mar 18
Apr 26
May 1
Jun 26
Jul 26
Aug 6
Sep 10
Sep 14
Sep 23
Sep 25
Sep 4
Oct 19

Sort dates in reverse month order.

$ sort -M -r dates
Oct 19
Sep 4
Sep 25
Sep 23
Sep 14
Sep 10
Aug 6
Jul 26
Jun 26
May 1
Apr 26
Mar 18
Feb 2
Jan 8
Jan 4

Merge two files of dates and display in month order.

$ sort -m dates dates2 | sort -M
Jan 1
Jan 4
Jan 8
Feb 2
Mar 18
Mar 18
Apr 26
May 1
Jun 26
Jul 26
Aug 6
Sep 10
Sep 14
Sep 23
Sep 25
Sep 4
Oct 19
Nov 28
Dec 25

Do the same thing, but remove duplicate dates.

$ sort -m -u dates dates2 | sort -M
Jan 1
Jan 4
Jan 8
Feb 2
Mar 18
Apr 26
May 1
Jun 26
Jul 26
Aug 6
Sep 10
Sep 14
Sep 23
Sep 25
Sep 4
Oct 19
Nov 28
Dec 25

I have run into enough situations where sorting data by IP address has saved me a lot of time and effort that I have turned my sort-by-IP command into an alias that I can now use anytime I need it.

$ alias byIP='sort -n -t . -k 1,1 -k 2,2 -k 3,3 -k 4,4'
$ getNodes | byIP
10.1.12.98
10.2.99.21
10.3.7.11
10.3.45.7
10.3.45.67
192.168.0.1

Read more of Sandra Henry-Stocker's Unix as a Second Language blog and follow the latest IT news at ITworld, Twitter and Facebook.

Insider: How the basic tech behind the Internet works
Join the discussion
Be the first to comment on this article. Our Commenting Policies