Data Deduplication and Backup

In the world of backup and disaster recovery, we’re constantly looking for performance advantages and ways to save space, especially as the storage requirements grow exponentially over time.

Data deduplication sounds like a complicated subject, but it’s really just what it sounds like—removing redundant data in storage. Really a simple concept. It provides a great way to keep storage costs under control, gain faster access, and ultimately, boost productivity. Deduplication is especially relevant in a virtual server environment which may have virtual machines scattered in hundreds of different locations, with each virtual machine containing a large amount of duplicate data. And here’s the big issue: Storage may be cheap, but companies grow quickly and their data requirements grow even faster. We need key technology to keep the storage requirements down. Data deduplication simply lets you use your storage and backup media for a longer period of time, since you can fit more data on it than would otherwise be possible.

Here’s how it works: The deduplication technology looks for redundant data, and when it encounters a piece of redundant data, it places a marker that points to the duplicate data block instead of actually storing multiple instances of the same data block. How much space can you save? It’s significant. Some studies have shown that you can get a ratio of 20:1. That’s a big savings in terms of not having to buy more storage media, and also in terms of administrative overhead.

Another advantage is in the area of disaster recovery. As part of your disaster recovery plan, if you replicate all of your data to an off-site data center, this too delivers advantage in terms of lowered requirements for off-site storage space. Deduplication can actually be applied in several different areas; in local backup as well as in remote backup. If you have multiple remote sites across a WAN, deduplication starts to really save tons of space, and a great side effect of it is that it can also improve throughput in sending stored data, since there’s less of it to send.

What’s wrong? The new clean desk test
Join the discussion
Be the first to comment on this article. Our Commenting Policies