Unix Tip: Comparing Files with Checksums
Send in your Unix questions today! |
See additional Unix tips and tricks
Unix systems provide numerous ways to compare files. The most common way to verify that you have received or downloaded the proper file is to compute a checksum and compare it
against one computed by a reliable source. MD5 is frequently used to compute checksums
because it is computationally unlikely that two different files will ever have the same
checksum. Similar commands, such as sum and cksum, also compute checksums but not with
as much reliability. Let's look at several checksums and see why.
One of the first things you'll notice if you compare the output of the sum, time and md5
commands is the length of each calculated value. The sum command prints two numbers.
The first (31339 in our example) is a 16-bit checksum. This means that you will get any
of 65,536 distinct responses (from 0 to 65,535) for any file. The chance of getting the
same checksum for two files which are different is very small. If you have 65,000 files
to compare, however, the chance that two of them have the same checksum, though different,
is quite high. In fact, you'll probably have a number of false matches.
# sum /export/home/jdoe/bigfile.gz 31339 165523 home/jdoe/bigfile.gz
One characteristic of the sum command is that the length of the checksum has some
relationship to the length of the file. If one file contains "abc" and another contains
"abd", the checksums are only different by 1. This command is clearly using a very
simple calculation, better for verifying the integrity of a file than for heavy duty or
high security file checking.
# sum /tmp/ab* 304 1 /tmp/abc 305 1 /tmp/abd
The second number that sum prints is the number of 512-byte blocks that are in the file.
This helps considerably to insure that dissimilar files are clearly dissimilar. Unless
the files you are comparing are also roughly the same size, the fact that the checksums
are the same can be discounted.
Sign up for ITworld's Daily newsletter
Follow ITworld on Twitter @IT_world
jfruh
Apple syncing patent can't come soon enough
pasmith
New Twitter features borrow from 3rd party clients
Esther Schindler
Open Source Changes the Software Acquisition Process
mikelgan
How to set up continuous podcast play on the new iTunes
David Strom
Five important Windows 7 mobility features
sjvn
Guard your Wi-Fi for your own sake
Sandra Henry-Stocker
Grepping on Whole Words
Sidekick: The Good News & the Bad News
Either way you look at it Microsoft Data Center management did not follow standards or best practices in this failure. In which case it makes me wonder more about the outsourcing of corporate data much less personal data.
- mburton325
Join the conversation here
Quick, practical advice for IT pros. Made fresh daily.
Want to cash in on your IT savvy? Send your tip to tips@itworld.com. If we post it, we'll send you a $25 Amazon e-gift card.












