Compression and encryption

One of my colleagues asked me whether to compress a file before encrypting it or to encrypt before compressing it when sending the file through e-mail.

The answer is neither.

Compression algorithms such as the one used in WinZIP look for repeated sequences (e.g., groups of blanks, the sequence "the" and so on) and define a table of abbreviations that can replace those repeated sequences by a short code. For example, the frequently used four-byte sequence "the " might be represented by a two-byte code of, say, "/5." (I'm just making up the code as an illustration.) You may have noticed that picture files (JPG, GIF and so on) often have very-high compression ratios; the repeated sequences of similarly colored pixels allows a high efficiency in substituting short codes for long repeated sequences.

In contrast, it usually is not possible to compress an encrypted file. A good encryption algorithm will produce few repeated sequences that compression algorithms can use. An exception, in theory, might be a very simple technique called monoalphabetic substitution, which is much like using the secret decoder ring we used to play with when we were kids (well, some of us), which does not alter the frequency of repeated sequences.

Pretty Good Privacy (PGP), in particular, always compresses a file before it encrypts the data, so there is no need to do so manually. You can verify this claim by comparing the size of a PGP-encrypted file with the size of the WinZIP version of that encrypted file. For example, I just encrypted a simple 145K-byte RTF file out of curiosity: The PGP file is 25K bytes, the ordinary WinZIP file of the RTF is 23K bytes, and the WinZIP of the PGP file is 25K bytes.

To illustrate the effect, I encrypted the phrase "This sample has lots of repeated text. " The resulting PGP ciphertext was 241 bytes long. I then encrypted a buffer with 2,048 copies of that phrase. The ciphertext was only 1,053 characters long -- a mere 4.4 times larger, despite the 2,048-times-larger cleartext.

In summary, you don’t have to compress a file before encrypting it. On the other hand, as my colleague Mike Money pointed out, if you have several files to send to the same people, putting them into a single archive and then encrypting the archive makes perfect sense, even though there will be no significant reduction in size.

This story, "Compression and encryption " was originally published by NetworkWorld.

Insider: How the basic tech behind the Internet works
Join the discussion
Be the first to comment on this article. Our Commenting Policies