Sorting in Perl II (Schwartzian Transform)

By Andrew Johnson, ITworld |  How-to

Last week we explored using Perl's sort() function for simple sorting.
This week we will use the Schwartzian Transform (named after Randal
Schwartz, and often abbreviated to just ST) to demonstrate the sort()
functions ability to efficiently sort fields requiring extraction or
some computation..

Consider sorting colon-separated records on one or more fields. Here is
some sample data:

foo:23:bar:2.1
bar:42:qux:3.0
baz:19:foo:1.1
aux:19:foo:1.2

Now, the following attempts to sort this data on the 3rd data field:

#!/usr/bin/perl -w
use strict;
my @data = ;
my @sorted = sort custom @data;
print @sorted;

sub custom {
(split /:/,$a)[2] cmp (split /:/,$b)[2];
}
__DATA__
foo:23:bar:2.1
bar:42:qux:3.0
baz:19:foo:1.1
aux:19:foo:1.2

Although the script works, we have an efficiency problem here. When
sorting a list, the comparison function (custom() in this case) is
called many times and, in the above case, performs 2 split() operations
each time. A far better approach preprocesses the data, allowing more
direct access to the sort fields and any operations performed on the
data need only be performed once per record. After sorting, the data
can then be post-processed to obtain the original records. Here is one
way of doing that with each step separated:

#!/usr/bin/perl -w
use strict;
my @data = ;
my @pre = map { [$_, split /:/ ] } @data;
my @post = sort custom @pre;
my @sorted = map { $_->[0] } @post;
print @sorted;

sub custom {
$a->[3] cmp $b->[3];
}
__DATA__
foo:23:bar:2.1
bar:42:qux:3.0
baz:19:foo:1.1
aux:19:foo:1.2

In the above, we first preprocess our records into a list of anonymous
arrays where the first element is the entire record and the remaining
elements are the split() fields of that record. We then sort those
anonymous arrays on the field we desire (in this case, the third index
of each anonymous array). Now our @post array contains a sorted list of
anonymous arrays sorted on the correct field and we merely extract the
first element of each anon array to recover our original records.

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Ask a Question
randomness