Zipping your way to free space: Part 3

By Sandra Henry-Stocker, ITworld.com |  Operating Systems Add a new comment

In the last two columns (Part 1, Part 2), we contrasted a number of compression tools that you might use to free up disk space on your systems. In comparing compression utilities, the most important issues to consider are speed, the compression ratio (how much space you can expect to save), portability and reliability. Which of these factors ranks highest in your list depends on your application, but the compression tool that you use routinely should work well on most of the files that you need to compress, giving you reasonably good file size reduction and acceptable performance.

What we didn't consider in the earlier columns were decompression time (we looked only at compression time) and the possibility that some of the compression utilities may not work at all given extremely large files. We also didn't get into the issue of patent restrictions -- whether there are hidden strings attached to your use of particular compression tools.

Decompression Timing

With respect to timing, decompression time can be considerably more important than compression time. Why? Because the files that you compress may be delivered to any number of individual users or customers. In other words, every file that is compressed may be uncompressed tens or hundreds of times and is likely to be uncompressed at considerably more time-critical moments.

In glossing over the issue of decompression time, we might have assumed that compression and compression times would be roughly equivalent for any particular compression tool. This assumption, however, proves not to be the case, particularly with very large files. For some compression tools, the compression operation takes MUCH longer than the corresponding decompression. For others, the compression may be slightly faster. For every tool, however, the contents of a file, not just its size, determines how quickly it will be compressed and decompressed by the particular tool. The script presented last week has, therefore, been modified and presented again below -- this time measuring decompression time along with compression time with some surprising results.

Working with Very Large Files

If the files that you need to compress are particularly large, you may need to verify that the tool you want to use can compress and decompress them. Some compression utilities may break down when asked to process extremely large files. For example, the pack command may issue the following error message when asked to compress a 2 Gbyte file -- a reference to limitations of the particular compression algorithm (Huffman encoding) that it uses:

Huffman tree has too many levels - file unchanged

Working with Very Small Files

If the files that you need to work with are particularly small, you may see little or no space saving from compressing them. In fact, some files will even grow in size if you force a compression (some tools will simply not operate on files that are extremely small) -- as shown here.

First, we create an empty file:

> touch zerobytes

Then, we attempt to "compress" it, but the compress command refuses:

> compress zerobytes

zerobytes: -- file unchanged

Next, we gzip the file and look at the size of the rssultant file:

> gzip zerobytes

> ls -l zero*

-rw-r--r-- 1 shs staff 30 Feb 8 16:19 zerobytes.gz

After we gunzip the file, we compress it again, this time with bzip2:

> bzip2 zerobytes
> ls -l zero*
-rw-r--r-- 1 shs staff 14 Feb 8 16:19 zerobytes.bz2

After we bunzip2 the file, we compress it with zip:

> zip zerobytes.zip zerobytes
adding: zerobytes (stored 0%)
> ls -l zer*
-rw-r--r-- 1 shs staff 0 Feb 8 16:19 zerobytes
-rw-r--r-- 1 shs staff 150 Feb 8 16:22 zerobytes.zip

Clearly, compression of an empty file yields a non-empty file. The small amount of content, reflecting some overhead associated with the particular compression algorithm, is negligible but points to the fact that compression isn't always a good thing.

When It's Not Worth It

While file compression can be a sysadmin's friend, not every large file is worth compressing. I've run into numerous cases in which a tar file compresses down to 90-95% of its original size. In cases such as this, the savings is hardly worth the time and trouble of compressing and decompressing. Unless there is something to be gained by keeping your files in the same format (e.g., compressed tar files), you might as well not bother.

The New Script

    Add a comment

    Post a comment using one of these accounts
    Or join now
    At least 6 characters

    Note: Comment will appear soon after you have activated your account.
    Obscene/spam comments will be removed and accounts suspended.
    The information you submit is subject to our Privacy Policy and Terms of Service.

    ITworld LIVE

    Operating SystemsWhite Papers & Webcasts

    White Paper

    Microsoft Enterprise Agreement Program Overview

    Discover how flexible the Microsoft Enterprise Agreement Program is to help you build the right software solution agreement for your business. This paper highlights all the available options-from on-premise software and cloud service solutions, to payment options and enrollment programs, and more.

    White Paper

    Watson - A System Designed for Answers. The future of workload optimized systems design

    Watson is a workload optimized system designed for complex analytics, made possible by integrating massively parallel POWER7 processors and DeepQA technology. Read the white paper about Watson's workload optimized system design.

    See more White Papers | Webcasts

    Ask a question

    Ask a Question