Finding Text in Context

By Sandra Henry-Stocker, ITworld.com |  Hardware Add a new comment

Unix tools are great for finding particular text in an input stream and for selecting a portion of each line to display. Like selecting rows and columns from a table in a relational database, commands like grep and awk allow us to trim huge amounts of data down to just what we want to see. In fact, the ability to pipe commands together and get a useful answer in a single line of commands continues to be one of the great appeals of our favorite operating system (in all its many flavors).



But what should you do if you want to see not just some target text, but a portion of the text surrounding it? What if you need to see your search terms in context to know whether your hits are relevant? When one of you readers asked me this question, my first reaction was ... good question! Then I started wondering how hard it would be in Perl to capture and display the text surrounding my search terms and how much data I might have to load into memory.



But soon afterwards, I was trying my hand at GNU grep and found out that its easy-to-use options for including lines before and after the target text turn what might have been a tricky scripting task into a veritable fete accompli! Check out this GNU grep command run against the text of the Gettysburg Address. The -B (before) and -A (after) arguments are requesting that the command provide two lines before and after those lines containing the search term. Notice, however, that our displayed text includes not five, but seven lines. This is because the target text (the word "consecrate") appears twice in this excerpt. We get, therefore, two lines before the first appearance of the word and two lines after the second. There is a single line of text between the two lines containing our target word.

  bash-2.03$ /opt/gnu/bin/grep -B 2 -A 2 consecrate gburg.txt
  this.


But, in a larger sense, we can not dedicate -- we can not consecrate -- we can not hallow -- this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated


How nice not to have had to think through the algorithm to display multiple hits in this seamless manner! The -B and -A options can also be specified as --before-context and --after-context, though I prefer to type as little as possible so I use the one-letter versions. Of course, where there's one good option, there may be another, so let's see what else GNU grep can provide. Another GNU grep option allows you to limit the number of matches that you'll find. Going back to our Gettysburg Address example, Abe Lincoln used the word "dedicate" (or a derivative) six times in his 275-word address. Let's say we only want to see only the first three instances of this word. No problem for GNU grep:

  bash-2.03$ /opt/gnu/bin/grep -m 3 dedicate gburg.txt
  a new nation, conceived in Liberty, and dedicated to the proposition that
  nation so conceived and so dedicated, can long endure. We are met on a great
  battle-field of that war. We have come to dedicate a portion of that field,



Or, maybe we want to print line numbers along with our located text:

  bash-2.03$ /opt/gnu/bin/grep -n dedicate gburg.txt
  2:a new nation, conceived in Liberty, and dedicated to the proposition that
  6:nation so conceived and so dedicated, can long endure. We are met on a great
  7:battle-field of that war. We have come to dedicate a portion of that field,
  12:But, in a larger sense, we can not dedicate -- we can not consecrate -- we
  16:forget what they did here. It is for us the living, rather, to be dedicated
  18:nobly advanced. It is rather for us to be here dedicated to the great task


And, if we have trouble locating our target words in the text, we tell GNU grep to colorize our search terms: 


bash-2.03$ /opt/gnu/bin/grep -n --color=always dedicate gburg.txt 2:a new nation, conceived in Liberty, and dedicated to the proposition that 6:nation so conceived and so dedicated, can long endure. We are met on a great 7:battle-field of that war. We have come to dedicated a portion of that field, 12:But, in a larger sense, we can not dedicated -- we can not consecrate -- we 16:forget what they did here. It is for us the living, rather, to be dedicated 18:nobly advanced. It is rather for us to be here dedicated to the great task

    Add a comment

    Post a comment using one of these accounts
    Or join now
    At least 6 characters

    Note: Comment will appear soon after you have activated your account.
    Obscene/spam comments will be removed and accounts suspended.
    The information you submit is subject to our Privacy Policy and Terms of Service.

    ITworld LIVE

    Ask a question

    Ask a Question