Sorting in Perl II (Schwartzian Transform)

Last week we explored using Perl's sort() function for simple sorting.

This week we will use the Schwartzian Transform (named after Randal

Schwartz, and often abbreviated to just ST) to demonstrate the sort()

functions ability to efficiently sort fields requiring extraction or

some computation..

Consider sorting colon-separated records on one or more fields. Here is

some sample data:

foo:23:bar:2.1

bar:42:qux:3.0

baz:19:foo:1.1

aux:19:foo:1.2

Now, the following attempts to sort this data on the 3rd data field:

#!/usr/bin/perl -w

use strict;

my @data =

; my @sorted = sort custom @data; print @sorted; sub custom { (split /:/,$a)[2] cmp (split /:/,$b)[2]; } __DATA__ foo:23:bar:2.1 bar:42:qux:3.0 baz:19:foo:1.1 aux:19:foo:1.2 Although the script works, we have an efficiency problem here. When sorting a list, the comparison function (custom() in this case) is called many times and, in the above case, performs 2 split() operations each time. A far better approach preprocesses the data, allowing more direct access to the sort fields and any operations performed on the data need only be performed once per record. After sorting, the data can then be post-processed to obtain the original records. Here is one way of doing that with each step separated: #!/usr/bin/perl -w use strict; my @data = ; my @pre = map { [$_, split /:/ ] } @data; my @post = sort custom @pre; my @sorted = map { $_->[0] } @post; print @sorted; sub custom { $a->[3] cmp $b->[3]; } __DATA__ foo:23:bar:2.1 bar:42:qux:3.0 baz:19:foo:1.1 aux:19:foo:1.2 In the above, we first preprocess our records into a list of anonymous arrays where the first element is the entire record and the remaining elements are the split() fields of that record. We then sort those anonymous arrays on the field we desire (in this case, the third index of each anonymous array). Now our @post array contains a sorted list of anonymous arrays sorted on the correct field and we merely extract the first element of each anon array to recover our original records. The three steps above can be combined by simply passing the results of the preprocessing stage directly into the sort stage and passing those results through the post-processing stage and out into our sorted array: #!/usr/bin/perl -w use strict; my @data = ; my @sorted = map { $_->[0] } sort custom map { [$_, split /:/ ] } @data; print @sorted; sub custom { $a->[3] cmp $b->[3]; } __DATA__ foo:23:bar:2.1 bar:42:qux:3.0 baz:19:foo:1.1 aux:19:foo:1.2 This 'map/sort/map' algorithm, referred to as the ST, provides a nice clean and relatively efficient way to sort on computed or extracted fields. We can also easily adjust our custom() comparison routine to sort on multiple fields as shown in last week
What’s wrong? The new clean desk test
Join the discussion
Be the first to comment on this article. Our Commenting Policies