Unix: Avoiding repetitious work with sed

The only thing wrong with sed is that most of us barely grasp how powerful it is. We use it in pipes to change "this" to "that", but we rarely consider it for the big editing jobs where it could save us considerable time and effort -- especially when it comes to boring repetitious work.

By  

sed is an extremely useful utility, but is generally used to substitute text in very simple pipes when, in truth, it can do a LOT more. Both sed commands and standalone sed scripts can perform a lot of useful text editing -- likely a lot more than most of us have ever considered. In today's post, we're going to look at some sample sed commands and some scripts that demonstrate how easily sed can save us a lot of time.

To begin, let's look at how sed is generally used in scripts and on the command line. On the command line, we might do something like this:

sed 's/this/that/' infile > outfile

Commands like this allow us to substitute words, strings or phrases in our files. We just have to remember to move outfile to infile after we verify (as needed) that the expected changes were made.

If you want to remove the first line from a file -- maybe you want the list of items contained in a file, but not the headings, you can do this:

sed '1d' myfile > newfile

We could use sed in a script like this to remove blank lines from a series of files:

#!/bin/bash

for file in `ls myfile*`
do
    sed '/^$/d' $file > $file.tmp
    mv $file.tmp $file
done

The ^$ in this script represents blank lines (lines that contain no characters at all) and the "d" tells sed to remove those lines. The / characters are the delimiters.

/^$/d

You can delete everything from the first line through to the first blank line -- basically the first paragraph -- using an expression like this:

1,/^$/d

Put that command in a file (e.g., sed11) and run it against your target file like so:

$ sed -f sed12 1492
He had three ships and left from Spain;
He sailed through sunshine, wind and rain.

He sailed by night; he sailed by day;
He used the stars to find his way.

The first stanza of "In fourteen hundred ninety-two, Columbus sailed the ocean blue" gets blown away.

But even commands like these just touch the surface of what sed can do.

We can put a series of editing changes into a sed script and then apply those changes repeatedly as needed. The only thing you need to watch out for is making sure that the order in which you make your changes works. If you were to put these sed commands in a script, for example, any instances of "one" in your original file would end up as "four" since each of the changes shown will be made sequentially to each line in your file.

s/one/two/
s/two/three/
s/three/four/

The script below would turn a file with references to "last year" to "2011", "this year" to "last year" and "next year" to "this year". It also changes "have" to "had" and "have" to "had" and "had" to "have", but only if the line contains the word "next".

s/last year/2011/
s/this year/last year/
s/have/had/
/next/s/had/have/
s/next/this/

OK, maybe not the best example possible, but it would change this:

$ cat funds
For this year, we have $500K
For last year, we had only $300K
For next year, we will have $550K

into this:

$ sed -f sed6 funds
For last year, we had $500K
For 2011, we had only $300K
For this year, we will have $550K

So, as just shown, you can specify that changes that will depend on the content of the particular lines. You can also constrain your changes to, say, the first 10 lines of a file, lessening the chance of any unintended changes by adding a line specification like so:

1,10{
  s/last year/2011/
  s/this year/last year/
  s/have/had/
  /next/s/had/have/
  s/next/this/
}

The sed command shown below can come in handy if you want to remove html markers from a file, leaving just the text. It does this by looking for strings that begin with <, contain some text other than > and then end in >. In this expression, the [^&gt]* string matches any content other than > and as much of as as there is before we reach the > sign. This kind of technique can save you a lot of time if you need to extract just the text from html files.

$ sed -e 's/<[^>]*>//g' index.html > index.txt

There are also a series of named character sets that can save you time working with sed since you can use the expressions shown rather than having to create the character sets yourself.

[:alnum:]	Alphanumeric [a-z A-Z 0-9]
[:alpha:]	Alphabetic [a-z A-Z]
[:blank:]	Spaces or tabs
[:cntrl:]	Any control characters
[:digit:]	Numeric digits [0-9]
[:graph:]	Any visible characters (no whitespace)
[:lower:]	Lower-case [a-z]
[:print:]	Non-control characters
[:punct:]	Punctuation characters
[:space:]	Whitespace
[:upper:]	Upper-case [A-Z]
[:xdigit:]	hex digits [0-9 a-f A-F]

The command shown below, for example, will strip out any strings that start with letters but also contain digits.

sed 's/\([[:alpha:]]\+[[:digit:]]\+[[:alnum:]]*\) \?//g' myfile

The sed utility is a lot more complicated and far more versatile than most of us have come to recognize. Given a couple good online tutorials and enough examples, you might find yourself taking more advantage of this powerful Unix utility.

Read more of Sandra Henry-Stocker's Unix as a Second Language blog and follow the latest IT news at ITworld, Twitter and Facebook.

Photo Credit: 

flickr / alanwordguy

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Answers - Powered by ITworld

ITworld Answers helps you solve problems and share expertise. Ask a question or take a crack at answering the new questions below.

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Ask a Question