Grep this

Unix Insider |  Operating Systems 1 comment

The grep utility, which allows files to be searched for strings of words, uses a syntax similar to the regular expression syntax of the vi, ex, ed, and sed editors. grep comes in three flavors, grep, fgrep, and egrep, all of which I'll cover in this article.

The name grep is derived from the editor command g/re/p, which literally translates to "globally search for a regular wxpression and print what you find." Regular expressions are at the core of grep, and I'll cover them after a brief description of some of the utility's command options.

The simplest grep command is grep (search pattern) (files list), as in:

 grep hello *

The output of this command might be something like this:

 $ grep hello *
 story.txt: so I said hello and she smiled back
 intro.txt: use the hello.c program as an example of C programming
 $

grep is case sensitive, so in order to change the search to include "hello," "Hello," or "HELLO," use the -y or -i option. Earlier versions of grep used -y, and later versions use -i. -y is now considered obsolete, although some versions of grep do support both. In the following example, more hellos show up because the search is case independent.

 $ grep -i hello *
 story.txt: so I said hello and she smiled back
 story.txt: I could hear my echo, "HELLO."
 intro.txt: use the hello.c program as an example of C programming
 hello.c:     printf("Hello, world. \n");
 $

This command searches all files in the current directory and prints the file name and the line containing the string "hello" for any files that contain that string.

The output of grep varies depending on whether you're searching one or several files. If only one file is named on the command line, the output doesn't include the file name, as in the following example:

 $ grep -i hello hello.c
     printf("Hello, world. \n");
 $

The one-file rule applies whether you use a wild card in your file list or not. If hello.c were the only file in the current directory, using a wild card to locate the file would still produce an unnamed file output. In the following example, the user is searching for any C files containing "hello." There is only one C file in the directory, so the output is identical to the previous example.

 $ grep -i hello *.c
     printf("Hello, world. \n");
 $

I don't know of a grep that has a work-around for this behavior, but you could use the -l option instead, which prints the file name only and not the line containing the string. At least you would know the name of the file that contained the string.

 $ grep -il hello *.c
 hello.c:
 $

The -l option can be used to extract a list of files containing the string. The file name is printed only once, even though the string may appear in multiple lines within that file. In the following example, story.txt appears only once, even though it contains more than one "hello."

 $ grep -il hello *
 hello.c:
 intro.txt:
 story.txt:
 $

The -l option suppresses most of the other output options from grep. On the other hand, the -n option will print a line number as well as the text, as in the following example:

1 comment

    Anonymous 1 year ago
    I have a .CSV file containing records that should have 31 commas. Can I use grep to identify the good records versus the bad ones. Some of the fields have their own imbedded commas and they are causing problems. Thanks.

      Add a comment

      Post a comment using one of these accounts
      Or join now
      At least 6 characters

      Note: Comment will appear soon after you have activated your account.
      Obscene/spam comments will be removed and accounts suspended.
      The information you submit is subject to our Privacy Policy and Terms of Service.

      ITworld LIVE

      Operating SystemsWhite Papers & Webcasts

      White Paper

      Microsoft Enterprise Agreement Program Overview

      Discover how flexible the Microsoft Enterprise Agreement Program is to help you build the right software solution agreement for your business. This paper highlights all the available options-from on-premise software and cloud service solutions, to payment options and enrollment programs, and more.

      White Paper

      Watson - A System Designed for Answers. The future of workload optimized systems design

      Watson is a workload optimized system designed for complex analytics, made possible by integrating massively parallel POWER7 processors and DeepQA technology. Read the white paper about Watson's workload optimized system design.

      See more White Papers | Webcasts

      Ask a question

      Ask a Question