The challenge faced by some customers (especially large ones) was that many backup systems didn't know how to share a large disk system and use it for backups. Sure, they could back up to a disk drive, but what if you needed to share that disk drive among multiple backup servers? Many backup products still can't do that, especially Fibre Channel-connected disk drives. Enter the virtual tape library, or VTL. It solved this sharing problem by presenting the disk drives as tape libraries, which the backup software products have already learned how to share. Now you could share a large disk system among multiple servers.
In addition, customers more familiar with a tape interface were presented with a very easy transition to backing up to disk. Another approach to creating a shareable disk target is the intelligent disk target, or IDT. Vendors of IDT systems felt the best approach was to use the NFS or CIFS protocol to present the disk system to the backup system. These protocols also allowed for easy sharing among multiple backup servers.
But both VTL and IDT vendors had a fundamental problem: The cost of disk made their systems cost effective as staging devices only. Customers stored a single night's backups on disk and then quickly streamed them off to tape. They wanted to store more backups on disk, but they couldn't afford it. Enter data deduplication.
The magic of data deduplication Typical backups create duplicate data in two ways: repeated full backups and repeated incrementals of the same file when it changes multiple times. A deduplication system identifies both situations and eliminates redundant files, reducing the amount of disk necessary to store your backups anywhere from 10:1 to 50:1 and beyond, depending on the level of redundancy in your data.
Deduplication systems also work their magic at the subfile level. To do so, they identify segments of data (a segment is typically smaller than a file but bigger than one byte) that are redundant with other segments and eliminate them. The most obvious use for this technology is to allow users to switch from disk staging strategies (where they're storing only one night's worth of back-ups) to disk backup strategies (where they're storing all onsite backups on disk).
There are two main types of deduplication: