While inline deduplication can cut the amount of data stored by a ratio of about 20:1, it isn't scalable, and it can hurt performance and force users to buy more servers to perform the deduplication, critics say. On the other hand, Schulz says that postprocessing deduplication requires more storage as a buffer, making that space unavailable for other uses.
For customers with multiple servers or storage platforms, enterprisewide deduplication saves money by eliminating duplicate copies of data stored on the various platforms. This is critical because most organizations create as many as 15 copies of the same data for use by applications such as data mining, ERP and customer relationship management systems, says Randy Chalfant, vice president of strategy at disk-based storage vendor Nexsan Corp. Users might also want to consider a single deduplication system to make it easier for any application or user to "rehydrate" data (return it to its original form) as needed and avoid incompatibilities among multiple systems.
Schulz says primary deduplication products could perform in preprocessing mode until a certain performance threshold is hit, then switch to postprocessing.
Another option, policy-based deduplication, allows storage managers to choose which files should undergo deduplication, based on their size, importance or other criteria.
SFL Data, which gathers, stores, indexes, searches and provides data for companies and law firms involved in litigation, has found a balance between performance and data reduction. It's deploying Ocarina Networks' 2400 Storage Optimizer for "near-online" storage of compressed and deduplicated files on a BlueArc Mercury 50 cluster that scales up to 2 petabytes of usable capacity, rehydrating those files as users require them.
"Rehydrating the files slows access time a bit, but it's far better than telling customers they have to wait two days" to access those files, says SFL's technical director, Ruth Townsend, noting that the company gets as much as 50% space savings through deduplication and file compression.
Probably the most well-known data reduction technology, compression is the process of finding and eliminating repeated patterns of bytes. It works well with databases, e-mail and files, but it's less effective for images. It's included in some storage systems, but you can also find stand-alone compression applications or appliances.
Dedupe and Compression: Better Together?