Unix: Sorting information that isn't quite numeric

Sorting data numerically and alphanumerically isn't generally much of a challenge on Unix systems, but sometimes .5 is smaller than .11.

By  

Sorting numbers is extremely simple on Unix systems; just use the -n option with your sort commands. But when the numeric information that we want to sort is broken up into chunks of uneven size -- as it generally is with IP addresses and section numbers within documents, we need something more than just -n. An address like 10.3.45.7, for example, will show up earlier in our sorted output than 10.3.7.11 just as Section 3.5 in a document precedes 3.12. So, what do we do?

To begin, we need to prepare ourselves for using a more detailed sort command. One of the key options we will need is sort's -t. It works like the -F option of awk, allowing us to specify that our field specifier should be a dot and, thus, allowing us to address each field in our addresses separately. Starting our sort command with sort -t . will allow each IP address (IPv4 anyway) to be sorted with respect to each of its four octets (i.e., each byte in the address).

Before we get into the command for sorting IP addresses, however, let's first look at some simpler examples of sort commands. In this first display, we contrast an alphanumeric sort with a numeric sort. The results are quite different, of course, but the numeric sort works just like we'd expect. The resultant list is clearly in numeric order.

$ cat nums	$ sort nums	$ sort -n nums
98.4		1		1
98.15		100.9		7
98.9		11		11
100.9		21		21
67		7		67
21		67		98.15
11		98.15		98.4
7		98.4		98.9
1		98.9		100.9

The problem with IP addresses is that .15 isn't smaller than .9 and so a numeric sorting of fields (unless they're padded with zeroes) isn't going to work. Another option that sort provides, however, is a -k option that allows us to sort on a particular portion of our numeric data. If we want, for example, to sort a series of phone numbers on just the last four digits, we could use a command like this:

$ sort -t - -k3 phones
410-290-1225
949-987-1234
301-945-1264
410-290-6543
640-465-9681

This command tells sort to sort the data on the third field using a hyphen as the field separator. So, the numbers get sorted on just the last 4 digits, often referred to as "the extension". Phone numbers aren't much of a challenge because they all have the same number of digits (ignoring the possibility of there being both local and international numbers in the list). Sorting numerically on the same-length numeric fields does just what we'd expect.

$ sort -n phones
301-945-1264
410-290-1225
410-290-6543
640-465-9681
949-987-1234

IP addresses, on the other hand, can have anywhere from one to three digits in any field so, if we want to see 3 in the resulting list before we see 11 in the same octet, we have to work a little harder. Just specifying that our field separator is a dot doesn't quite cut it. Notice here how 10.3.7.11 follows 10.3.45.7 in the list.

$ sort -t .
Photo Credit: 

flickr /Key Foster

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Answers - Powered by ITworld

Ask a Question