Unix: Writing more efficient shell scripts

Want to tune your shell scripts to run more efficiently? Let's check out a few changes that might help them run faster.

The most important thing to think about when you're preparing scripts is whether they work properly -- are doing what they're meant to do and not causing any problems that you didn't expect. Scripts should be reliable, should fail gracefully if the user calls them with the wrong arguments and do only predictable things. At the same time, some scripting techniques allow scripts to run much more efficiently. So let's take a look at some techniques that might get your scripts running faster. One of the first things you can do to make your scripts run more efficiently is to reduce the amount of output that your script sends to the screen. In general, it's a good idea for a script to produce only as much output as is required for the script to do the work intended and for the person running the script to be confident that it's working. Sending output to the screen takes a lot of time compared with other commands that you are likely to use in your scripts. So, if the point of your script isn't to send a lot of output to the screen, don't. The echo command in this script might show the user that your script is running through the days of the week -- in this case to create some logs, but lots of extraneous echo commands can slow a script siginificantly, especially if it's handing a tremendous amount of data.

for day in Mon Tue Wed Thu Fri
do
    echo $day
    touch $day.log
done

One way to reduce unnecessary output is to provide a verbose option so that anyone using your scripts can see additional output when -- and only when -- they need to better understand how your scripts are working or to troubleshoot some problem they are having. Most of the time, however, that output will not be needed and could be suppressed.

for day in Mon Tue Wed Thu Fri
do
    if [ $verbose ]; then echo $day; fi
    touch $day.log
done

You should also, whenever possible, limit how much you write to external files. Interaction with the file system is another time-consuming operation. Instead of using temporary files to hold information that a script requires, you might, instead, store data in data structures inside your scripts. Without involving the file system, your scripts will likely run significantly faster. Depending on the complexity of your scripts, the difference in timing might be trivial or extremely significant. I have sat and watched scripts that seemed to run on and on when, when output to the screen or interaction with external files was removed, finished within seconds. Another time-saving option revolves around the logic that you use in your scripts. When possible, prioritize your if tests so that you are likely to run as few tests as possible. This is especially true if you are running through a long series of tests in an if then else structure. Organize your tests such that you are likely to run only one test before reaching whatever conclusion lets your script take the next action. Find ways to skip over or quickly dispose of data that meets simple "can ignore" criteria. When you can, put the most likely-to-fail test first in a compound AND.

if [[ $foo -eq 1 && $bar -eq 2 && $tmp -eq 3 ]]; then

Remember that AND tests require that all arguments be true, so the second and third tests won't even be run if the first test fails. The same guidance applies to OR commands except that, for OR, any true condition makes the entire test true, so you'd want to put the most likely-to-pass test first.

if [[ $tmp -eq 3 || $bar -eq 2 || $foo -eq 1 ]]; then

Avoid putting commands in loops if they can be run once outside of loops. In other words, don't run commands multiple times if you can run them once and reuse the results. Run commands as few times as possible, especially commands that are not built-ins (internal to the shell).

$day=`date %A`			# <= here
while [ $stat != "end" ]
do
  case $day in ...
while [ $stat != "end" ]
do
  $day=`date %A`		# <= not here
  case $day in ...

If you have a choice between using a shell built-in and a standalone command, use the built-in. It will always be faster. Also avoid using full paths for commands. Since bash has built-ins for some commands you will find in /bin and /usr/bin (e.g., echo), avoiding the full path allows bash to use the built-ins that are available to it. Another example of this is to use internal commands for math instead of commands such as expr.

$sum=$(( 11 + 11 ))
$sum=$(( $var1 + $var2 ))

While the time differences will be trivial for trivial scripts, they can make noticeable differences on complex scripts which run against large data collections. To view the differences between the way two scripts run, use the time command. We can look at the difference that the ordering of the choices in a compound IF command with ANDed tests. First, here's the script:

$ cat ex1
#!/bin/bash

echo -n "enter foo> "; read foo;
echo -n "enter bar> "; read bar;
echo -n "enter tmp> "; read tmp;

if [[ $foo -eq 1 && $bar -eq 2 && $tmp -eq 3 ]]; then
    echo ok
fi

And here's the timing:

$ time ex1
enter foo> 3
enter bar> 2
enter tmp> 1

real    0m2.950s
user    0m0.002s
sys     0m0.002s
$ time ex1
enter foo> 1
enter bar> 2
enter tmp> 3
ok

real    0m1.755s
user    0m0.000s
sys     0m0.003s

Notice how failing on the first AND test, the script runs in about 60% of the time it would take if it fails on the third test. To make your scripts run quickly, you should also try to make the commands that you use as efficient as possible. Avoid unnecessary cat commands, unnecessary grep commands, unnecessary pipes, etc. whenever you can. Commands such as grep this /that/file will always run faster than commands such as cat /that/file | grep this. Timing differences measured in seconds don't count for much for scripts that run against small data collections. Quick and dirty scripts can still be quick and dirty. But scripts that are intended to run against big data problems can benefit significantly from more careful scripting techniques.

Read more of Sandra Henry-Stocker's Unix as a Second Language blog and follow the latest IT news at ITworld, Twitter and Facebook.

What’s wrong? The new clean desk test
Join the discussion
Be the first to comment on this article. Our Commenting Policies