Unix: Counting chickens or anything else

Unix tools make it easy to find strings in files, but what if you want to find specific whole words, more complex text patterns, or every instance of a word?

By  

Basic Unix commands make it easy to determine whether files contain particular strings. Where would we be without commands like grep? But sometimes when using grep, you can get answers that under- or overreport the presence of what you are looking for. Take a very simple grep command for an example.

$ grep word mybigfile | wc -l
98

Commands like this tell you how many lines contain the word you are looking for, but not necessarily how many times that word appears in the file. After all, the word "word" might appear twice or more times in a single line and yet will only be counted once. Plus, if the word could be part of longer words (like "word" is a part of the word "password" and the word "sword"), you might even get some false positives. So you can't depend on the result to give you an accurate count or even if the word you are looking for appears at all unless, of course, if the word you are looking just isn't going to be part of another word -- like, maybe, chicken.

Trick #1: grep with -w

If you want to be sure that you count only the lines containing "word", you can add the -w option with your grep command. This option tells grep to only look for "word" when it's a word on its own, not when it is part of another word.

$ grep -w word mybigfile | wc -l
54

Trick #2: looping through every word

To be sure that you count every instance of the word you are looking for, you might elect to use some technique that examines every word in a file independently. The easiest way to do this is to use a bash for command. After all, any time you use a for command, such as for letter in a b c d e, the command loops once for every argument provided. And, if you use a command such as for letter in `cat mybigfile`,
it will loop through every word (i.e., every piece of text on every line) in the file.

$count=0
$ for word in `cat mybigfile`
> do
>   if [ $word == "word" ]; then
>      count=`expr $count + 1`
>   fi
> done
$ echo $count
71

If you need to do this kind of thing often -- that is, look for particular words in arbitrary files, then you might want to commit the operation to a script so that you don't have to type the looping and if commands time and time again. Here's an example script that will prompt you for the word you are looking for and the file you want to look through if you don't choose to provide them on the command line.

#!/bin/bash

if [ $# -le 2 ]; then
    echo -n "look for> "
    read lookfor
    echo -n "file> "
    read filename
else
    lookfor=$1
    filename=$2
fi

for w in `cat $filename`
do
  if [ $w == "$lookfor" ]; then
    count=`expr $count + 1`
  fi
done

echo $count

Trick #3: Looking for patterns

More interesting than looking for some specific word is the challenge of looking for various patterns in particular files.

Photo Credit: 

Image Credit: flickr/ brett jordan
https://creativecommons.org/licenses/by/2.0/

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Answers - Powered by ITworld

ITworld Answers helps you solve problems and share expertise. Ask a question or take a crack at answering the new questions below.

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Ask a Question
randomness