InfoWorld review: Data deduplication appliances

Data deduplication appliances from FalconStor, NetApp, and Spectra Logic provide excellent data reduction for production storage, disk-based backups, and virtual tape

Using the NetApp System Manager, it is an easy process to define a deduplication policy for each disk share on the appliance.

During my tests, I stored multiple daily backup sets to a NetApp CIFS volume from each server. Regardless of how or when the deduplication engine analyzed the stored backup files, I never got better than about 8% data reduction on the volume. Exchange message stores fared better, showing on average a reduction of 12% in disk usage.

I asked NetApp for a possible reason for this and was told the deduplication engine works on 4KB blocks. It seems the Backup Exec family of software inserts metadata into the backup files, messing up the alignment of the 4KB boundaries and making it much harder for NetApp to locate duplicate byte segments. Symantec has made a change in its Enterprise Vault 8.0 to block align with the NetApp engine, so not all Symantec products suffer from this misalignment. Backup software from some other vendors, including CommVault and VMware, keep the 4KB block boundaries in tact.

Admins can define a deduplication policy on a per-volume basis. The deduplication policy engine doesn't provide an overwhelming number of options, but it gets the job done. IT can define a policy to run dedupes manually on demand, or automatically when a specific amount of new data lands on the volume, or based on time of day or day of week. I was able to create a daily dedupe policy for a volume that started at 9 a.m. and stopped at 10 p.m. and ran every hour. Apart from the most extreme cases, this is overkill, but it is available if needed and it worked flawlessly.

IT has two options for managing the FAS2040: Web browser and the stand-alone management console, the NetApp System Manager. While the browser-based management portal was straightforward, I found System Manager much more user-friendly and intuitive, even more so than FalconStor's UI. Both storage controllers were represented in the management utility with each major function broken into separate grouped tasks, making it very easy to locate specific items.

As with FalconStor and Spectra Logic, there isn't a fancy reporting engine. There are, however, useful graphs and data points, such as volume details and space saved, scattered throughout System Manager. NetApp did a good job of organizing System Manager so that the amount of information presented in it is applicable and useful, without going overboard and inundating you with too much data.

I was really impressed with the FAS2040 from NetApp, both in terms of hardware options and manageability. I found the appliance very easy to integrate into my network and very easy to use. Deduplication was easy to manage, and files and folders that typically reside on a file system deduped with great success. My only complaint is with the poor results when deduping Backup Exec backup sets. Of course, no matter which deduplication solution you choose, you'll want to make sure that it works with your backup software.

On this particular iSCSI volume, I was able to achieve 92% disk space savings due to the highly redundant nature of the file data.

Spectra Logic nTier v80 and nTier vX The nTier family of deduplication appliances from Spectra Logic has a slightly different focus than FalconStor and NetApp. Their target market is IT shops that still primarily use tape and tape libraries for backup. The nTier line of appliances are VTL chassis that look like tape drives to the outside world and allow for the easy addition of deduplication during the backup process. The appliances are highly scalable and modular, allowing for in-place upgrades and a long life span. Making use of FalconStor's deduplication engine, the Spectra Logic appliance does a very good job of reducing the overall size of data backup.

To see Spectra Logic's solution in action, I received the nTier v80 and nTier vX VTL deduplication appliances. The nTier v80 is a 3U rack mount appliance that has a storage capacity of 8TB to 16TB (RAID 6) using SATA drives. The nTier vX is a massive 4U chassis nearly twice as deep as a standard rack mount server. Its storage capacity runs from 10TB to 60TB (RAID 6), upgradable in 10TB increments. Both chassis come with SCSI, Fibre Channel, and iSCSI (Gigabit Ethernet) interfaces for host connectivity and dual Intel Xeon multicore CPUs. Redundant power supplies and lots of fans round out the hardware.

The key to the Spectra Logic solution is its close ties to existing tape backup systems. Each VTL dedupe appliance can emulate a wide range of physical tape drives and libraries. In my lab, I chose the IBM Ultrium TD-3 (LTO3) format and defined 21 virtual tapes for six virtual tape drives. Spectra Logic supports eight different types of tape drives and ten different types of tape drive libraries.

As with FalconStor, I used iSCSI to connect my Windows Server 2008 server to the nTier v80 and Symantec Backup Exec 2010 to handle backup chores. The nTier vX was set up as a replication partner to the nTier v80, with deduplication taking place on the nTier v80 as soon as the backup completed and replication running at midnight. Both appliances worked flawlessly during my tests, with the nTier v80's VTL system "swapping out" tapes exactly as directed.

Spectra Logic licenses both the deduplication engine as well as part of the FDS management console in its nTier appliances. Here we see the VTL definition and an at-a-glance view of storage system usage.

I ran Backup Exec agents on four Windows Server 2008 R2 virtual machines running on Microsoft Hyper-V as well as a physical Windows Server 2008 R2 server. After the initial backups, the deduplication engine did an excellent job of detecting redundant data in each subsequent backup. I even tried to fool it by renaming groups of folders without changing the contents. In each case, the deduplication engine recognized the data and greatly reduced my backup size and replication footprint. In my test configuration, deduplication was done post-backup, but I could have easily run it in parallel with the backup. There is a slight performance penalty to deduping in real time, so for most users, post-processing is the way to go.

When a backup is made to one of the virtual tape drives, the data is written to the virtual tape in the same format as if it were a physical tape. This allows a copy to be saved and deduped on the nTier appliance and then streamed to a physical tape for off-site archival. One big advantage to using VTL is that IT staff already using physical tape drives and libraries don't have to learn a new backup system. They continue to use the same backup programs and schedules they are used to. Also, because each backup contains a catalog of the data, it's very easy to locate and restore files from the VTL.

Spectra Logic licenses FalconStor's deduplication engine and includes FalconStor's UI in its appliances for management purposes. All other aspects of the appliances are managed through Spectra Logic's own BlueScale management platform. BlueScale provides a common user interface across nTier and other Spectra Logic storage systems. From the BlueScale UI, I was able to see how effective the deduplication engine was, manage and maintain my virtual tape libraries, and define my replication schedule. I found it to be pretty intuitive to use after a few initial minutes of exploration.

The Spectra Logic nTier family of VTL appliances does an excellent job of standing in for physical tape drives and libraries, allowing for either a migration away from physical tape, or as an intermediary to physical tape to make it more efficient. Through iSCSI, each virtual drive looked like a physical drive to my backup software, and the deduplication engine worked well in all scenarios. The management UI was easy to navigate, although defining the virtual tape libraries was a little daunting. Nevertheless, for enterprises that want to keep the look and feel of tape but migrate to disk-based deduplication, the nTier family is a perfect fit. 

With the exception of the VTL components, Spectra Logic's management console looks and functions exactly like FalconStor's, providing an excellent quick view of disk usage and deduplication statistics.

  • Good all-around deduplication performance
  • 10Gb Ethernet support
  • Native Symantec OST support
  • Limited scalability (32TB)
  • No Fibre Channel
  • Highly scalable (136TB)
  • Dual storage controllers support active/passive and active/active configurations
  • Supports NAS, SAN, and Fibre Channel in one chassis
  • Poor dedupe results with Backup Exec backup sets
  • Excellent "drop in" VTL appliance
  • Dedupes to both virtual and physical tape libraries
  • Multiple connectivity options
  • Defining VTL libraries was a little difficult

This article, "InfoWorld review: Data deduplication appliances," was originally published at Follow the latest developments in storage and enterprise data management at

Read more about data explosion in InfoWorld's Data Explosion Channel.

This story, "InfoWorld review: Data deduplication appliances" was originally published by InfoWorld.

| 1 2 Page 8
ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon