June 28, 2014, 5:37 PM — The Unix diff command is very handy, but it can do a lot more than just let you know if two files that you’re evaluating are the same or different. It can find and show you the differences and it can find and show you those differences in any of several different ways. It can also generate files that can force files that are different to be the same. Let’s take a look at the most common uses of diff and then see what else it can do to make your work easier.
One of the most common uses of the diff command is to tell a user whether two files, which might appear to be the same (based on size and other characteristics such as permissions, ownership and name) are actually the same. The diff command can do a byte-by-byte comparison lickety-split (that’s 19th century for “very fast”) even if the files are very large.
$ diff /usr/bin/time /bin/date Binary files /usr/bin/time and /bin/date differ
OK, so these files are different. The executable for the time command and the one for the date command are not the same. This is no surprise there since the functionality of the commands is so different. You wouldn’t expect them to share code, so why would they be implemented as a single file? If the files you’re looking were not different in their content, you see no output from the diff command.
$ diff /usr/bin/zcmp /usr/bin/zdiff $
This output shouldn’t surprise us at all if we noticed that these two files are actually hard links like they are on the system I’m working on:
$ ls -i /usr/bin/zcmp /usr/bin/zdiff 3257719 /usr/bin/zcmp 3257719 /usr/bin/zdiff
Notice that they have the same inode numbers, so they’re the same file.
When used to detect and report on differences between text files, on the other hand, you can expect to see a lot more interesting results. If the files contain the exact same text, we’ll see no output again, but if we examine two very simple but different text files with diff, we’ll see something like this:
$ diff one two 1d0 < one 3d1 < three 5d2 < five 7c4,6 < seven --- > eight > ten > twelve
Sure, you might just look at this file and conclude that the files are different, but this output can be a lot more useful. Obviously, the files are different. But what do those command sequences like 3d1 actually mean?
To best understand what is happening here, let’s first look at the contents of the two files, one and two. They look like this:
one two === === one two two four three six four eight five ten six twelve seven
Now look back at the diff output. In each command string (e.g., 3d1), the part before the letter (e.g., 3), is a line number or a range of lines. The “3” means “line 3”.
flickr / Ray Forster