Unix tip: Using Bash's regular expressions

Bash has quietly made scripting on Unix systems a lot easier with its own regular expressions. If you're still leaning on grep and sed commands to get your scripts to do what you need from them, maybe it's time to look into what bash can do on its own.

Since version 3 (circa 2004), bash has a built-in regular expression comparison operator, represented by =~. A lot of scripting tricks that use grep or sed can now be handled by bash expressions and the bash expressions might just give you scripts that are easier to read and maintain. As with other comparison operators (e.g., -lt or ==), bash will return a zero if an expression like $digit =~ "[[0-9]]" shows that the variable on the left matches the expression on the right and a one otherwise. This example test asks whether the value of $digit matches a single digit.

if [[ $digit =~ [0-9] ]]; then
    echo '$digit is a digit'
else
    echo "oops"
fi

You can also check whether a reply to a prompt is numeric with similar syntax:

echo -n "Your answer> "
read REPLY
if [[ $REPLY =~ ^[0-9]+$ ]]; then
    echo Numeric
else
    echo Non-numeric
fi

Bash's regular expressions can be fairly complicated. In the test below, we're asking whether the value of our $email variable looks like an email address. Notice that the first expression (the account name) can contain letters, digits and some special characters. The + to the right of the first ] means that we can have any number of such characters. We then see the @ sign sitting between the username and the email domain -- and a literal dot (\.) between the primary part of the domain name and the "com", "net", "gov", etc. part. The comparison is then enclosed in double brackets.

if [[ "$email" =~ "^[A-Za-z0-9._%+-]+<b>@</b>[A-Za-z0-9.-]+<b>\.</b>[A-Za-z]{2,4}$" ]]
then
    echo "This email address looks fine: $email"
else
    echo "This email address is flawed: $email"
fi

Similarly, you can construct tests that determine whether the value of variables is in the proper format for an IP address:

#!/bin/bash

if [ $# != 1 ]; then
    echo "Usage: $0 address"
    exit 1
else
    ip=$1
fi

if [[ $ip =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]; then
    echo "Looks like an IPv4 IP address"
elif [[ $ip =~ ^[A-Fa-f0-9:]+$ ]]; then
    echo "Could be an IPv6 IP address"
else
    echo "oops"
fi

Bash also provides for some simplified looping. Want to loop 100 times? Just do something like this:

for n in {1..100}
do
    echo $n
done

And you can loop through letters or through various ranges of letters or numbers using expressions such as these. You don't have to start with 1 or a and you can move backwards through the list.

{a..z}
{z..a}
{c..f}
{5..25}
{10..-10}

Want to see how these ranges work? You can also just try expanding them with the echo command.

$echo {a..z}
a b c d e f g h i j k l m n o p q r s t u v w x y z
$ echo {5..-1}
5 4 3 2 1 0 -1

What a swell shell!

Read more of Sandra Henry-Stocker's Unix as a Second Language blog and follow the latest IT news at ITworld, Twitter and Facebook.

Insider: How the basic tech behind the Internet works
Join the discussion
Be the first to comment on this article. Our Commenting Policies