Unix: Having fun with diff

The diff command can be your best friend when you’re troubleshooting problems on a Unix system, but are you using it effectively as you could?

The Unix diff command is very handy, but it can do a lot more than just let you know if two files that you’re evaluating are the same or different. It can find and show you the differences and it can find and show you those differences in any of several different ways. It can also generate files that can force files that are different to be the same. Let’s take a look at the most common uses of diff and then see what else it can do to make your work easier. One of the most common uses of the diff command is to tell a user whether two files, which might appear to be the same (based on size and other characteristics such as permissions, ownership and name) are actually the same. The diff command can do a byte-by-byte comparison lickety-split (that’s 19th century for “very fast”) even if the files are very large.

$ diff /usr/bin/time /bin/date
Binary files /usr/bin/time and /bin/date differ

OK, so these files are different. The executable for the time command and the one for the date command are not the same. This is no surprise there since the functionality of the commands is so different. You wouldn’t expect them to share code, so why would they be implemented as a single file? If the files you’re looking were not different in their content, you see no output from the diff command.

$ diff /usr/bin/zcmp /usr/bin/zdiff
$

This output shouldn’t surprise us at all if we noticed that these two files are actually hard links like they are on the system I’m working on:

$ ls -i /usr/bin/zcmp /usr/bin/zdiff
3257719 /usr/bin/zcmp  3257719 /usr/bin/zdiff

Notice that they have the same inode numbers, so they’re the same file. When used to detect and report on differences between text files, on the other hand, you can expect to see a lot more interesting results. If the files contain the exact same text, we’ll see no output again, but if we examine two very simple but different text files with diff, we’ll see something like this:

$ diff one two
1d0
< one
3d1
< three
5d2
< five
7c4,6
< seven
---
> eight
> ten
> twelve 

Sure, you might just look at this file and conclude that the files are different, but this output can be a lot more useful. Obviously, the files are different. But what do those command sequences like 3d1 actually mean? To best understand what is happening here, let’s first look at the contents of the two files, one and two. They look like this:

one           two
===           ===
one           two
two           four
three         six
four          eight
five          ten
six           twelve
seven

Now look back at the diff output. In each command string (e.g., 3d1), the part before the letter (e.g., 3), is a line number or a range of lines. The “3” means “line 3”. If it were, instead, “3,5”, it would mean lines 3 through 5.

$ diff one two
1d0
< one
3d1

The letter represents a command. The a (append), c (change), d (delete), i (insert), and s (substitute) commands can be used to resolve the differences between the two files. In other words, the commands could make the files identical. The numbers after the letter represent the line or the line range in the second file. The commands represent a script that, using the patch command, will change file “one” into a copy of file “two”. It would add the missing lines (those in file two and not in file one) and remove the excess lines (those in file one and not file two). While this might not seem like an exciting thing to do, imagine being able to replicate the changes in a configuration file using a set of ed commands that allow you to just capture and replicate just the changes without any other effects.

$ patch one -i fixit -o one1
patching file one
$ cat one1
two
four
six
eight
ten
twelve

File one1 is now updated with the change commands needed to make it look like two. In the real world, you might be changing a dozen lines and adding four to a file that’s several hundred lines long. And you might be doing this on several hundred systems. You could trust that process to someone’s “hand editing” or you could send out a “patch file” and maybe even a script to run it. Or you could, of course, copy the changed file around. But making the changes as easy as possible and as schedulable as possible might just be the best approach. You might, after all, be sending the fixes to your customers or staff at remote locations.

$ cat runme
patch one –i fixit –o one1
mv one1 one

You can also save the output from diff in ed format using the –e option like this:

$ diff –e fixit one two

In this case, the fixit file will look like this:

$ cat fixit
7c
eight
ten
twelve
.
5d
3d
1d

If you wanted, you could open the file to be changed in ed, type the commands 7c, eight, and so on as shown and the changes would be made. Then just exit ed with the w (write) command and quit with q. To run the changes from the command line, you would do something like this:

$ (cat fixit && echo w) | ed - one
$ cat one
two
four
six
eight
ten
twelve

The parenthesized commands are sent to ed which uses the “fixit” file to make the changes. The display of the file shows that the changes have been made. Another useful way to use the diff command is to use the –p option. With this option, the differences between the two files are illustrated in a way that provides more context for the viewer. As you’ll note below, we first get some information on the update times for the files. Then we see the contents of the two files with the lines in file appear in one file, but are missing from the other prepended with - or !.

$ diff -p one two
*** one 2014-06-28 17:04:05.000000000 -0400
--- two 2014-06-28 15:31:33.000000000 -0400
***************
*** 1,7 ****
- one
  two
- three
  four
- five
  six
! seven
--- 1,6 ----
  two
  four
  six
! eight
! ten
! twelve

You might like the –y option even more as it will show you the differences between two files in a side-by-side fashion.

$ diff -y one two
one                                       <
two                                         two
three                                     <
four                                        four
five                                      <
six                                          six
                                          > eight

You can also install and use the colordiff command if you would prefer to be showing the different lines in one of two colors to indicate the source (e.g., the lines from one file might be red, the other blue). If the files you want to compare are on two different systems, a better approach to determining whether they’re the same or not is to compute a checksum and compare the checksums. The md5sum command is ideal for this.

server1$ md5sum file1
0789d2dcc23a7984a47319228597c1c4  file1
server2> md5sum file1
95ee44328db4819563548fd9789becb2  file1

Options like –i to ignore differences in case, –ignore-all-space and –ignore-blank-lines can also come in very handy when you just don’t want to be bothered with insignificant file differences. The diff command has well over 60 options – suggesting that it’s a lot more complicated and versatile than you might have come to expect.

Read more of Sandra Henry-Stocker's Unix as a Second Language blog and follow the latest IT news at ITworld, Twitter and Facebook.

Insider: How the basic tech behind the Internet works
Join the discussion
Be the first to comment on this article. Our Commenting Policies