April 20, 2014, 6:45 PM — Basic Unix commands make it easy to determine whether files contain particular strings. Where would we be without commands like grep? But sometimes when using grep, you can get answers that under- or overreport the presence of what you are looking for. Take a very simple grep command for an example.
$ grep word mybigfile | wc -l 98
Commands like this tell you how many lines contain the word you are looking for, but not necessarily how many times that word appears in the file. After all, the word "word" might appear twice or more times in a single line and yet will only be counted once. Plus, if the word could be part of longer words (like "word" is a part of the word "password" and the word "sword"), you might even get some false positives. So you can't depend on the result to give you an accurate count or even if the word you are looking for appears at all unless, of course, if the word you are looking just isn't going to be part of another word -- like, maybe, chicken.
Trick #1: grep with -w
If you want to be sure that you count only the lines containing "word", you can add the -w option with your grep command. This option tells grep to only look for "word" when it's a word on its own, not when it is part of another word.
$ grep -w word mybigfile | wc -l 54
Trick #2: looping through every word
To be sure that you count every instance of the word you are looking for, you might elect to use some technique that examines every word in a file independently. The easiest way to do this is to use a bash for command. After all, any time you use a for command, such as for letter in a b c d e, the command loops once for every argument provided. And, if you use a command such as for letter in `cat mybigfile`,
it will loop through every word (i.e., every piece of text on every line) in the file.
$count=0 $ for word in `cat mybigfile` > do > if [ $word == "word" ]; then > count=`expr $count + 1` > fi > done $ echo $count 71
If you need to do this kind of thing often -- that is, look for particular words in arbitrary files, then you might want to commit the operation to a script so that you don't have to type the looping and if commands time and time again. Here's an example script that will prompt you for the word you are looking for and the file you want to look through if you don't choose to provide them on the command line.
#!/bin/bash if [ $# -le 2 ]; then echo -n "look for> " read lookfor echo -n "file> " read filename else lookfor=$1 filename=$2 fi for w in `cat $filename` do if [ $w == "$lookfor" ]; then count=`expr $count + 1` fi done echo $count
Trick #3: Looking for patterns
More interesting than looking for some specific word is the challenge of looking for various patterns in particular files.
Image Credit: flickr/ brett jordan