This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.
Deduplication, a fresh idea only a few years ago, has become a commodity, with organizations of all sizes deploying deduplication as just another feature in their data protection and backup solutions. This is progress. More data centers can eliminate the redundant data in their backup and storage systems to save money and increase efficiency. However, the job is not done. With deduplication in place, IT leaders can move on to adopting intelligent capabilities to ensure data is properly stored and protected. In 2013 data center managers will push for global deduplication that provides flexibility, scalability, performance and high availability of data.
Simple deduplication capabilities don't inspire much awe these days, but that doesn't mean they aren't a major accomplishment. Less than a decade ago, enterprises were plagued by multiple copies of data in their tape-based systems. There was no cost-effective way to replicate all of those copies off site in a way that protected network bandwidth and the bottom line. Deduplication opened the door to cost-efficient data backup and replication.
CLEAR CHOICE TEST: Recoup with data dedupe
By 2012 IT teams used various methods to identify duplicate data and retain information about all of the data under storage. That capability has become essential, since data is growing at the rate of 50% to 60% annually, which increases the need for effective data protection and storage solutions.
But is simple deduplication enough today? Most shops view deduplication as a basic feature, but in reality, it is a complicated activity that involves a number of resources and processes and requires the attention of the IT staff to manage that resource. Not all data deduplicates well, so IT still must monitor the data being stored in order to get the best utilization of deduplicated storage. A database that was backing up and replicating efficiently can suddenly fail to deduplicate well because compression or encryption was enabled at the database level.
Intelligent deduplication addresses some of these issues that are now coming to the forefront since organizations have mastered the more straightforward dedupe processes. In the coming year, IT leaders should look for deduplication capabilities that address the reporting and detection of data types. Being able to adapt to these data types, you will need to apply different policy options: inline deduplication, post/concurrent deduplication and not deduplicating.
The first policy, inline deduplication, makes the most sense for small storage configurations or environments with immediate replication needs. This option minimizes storage requirements and can deduplicate and replicate data more quickly. The post-process deduplication option occurs independently and can be scheduled for any point in time, including running concurrently. It can facilitate more efficient transfer to physical tape or more frequent restore activities by postponing deduplication. It allows deduplication solutions to make full use of available processing power while minimizing the impact to the incoming data stream. This process is geared toward multi-node clustered solutions, and it allows for full use of all computing resources. Finally, there are data types that simply do not deduplicate effectively and should not be included in the deduplication policies, including image data, pre-compressed or encrypted data.
Beware of the all-in-one approach
It can be tempting for enterprises and even small to midsize businesses to buy all-in-one storage or backup software solutions. Deduplication is just a commodity after all, right? Not so fast. Not all deduplication solutions deliver at the same level, and deduplication is not a solution that can be tacked onto an appliance or patched into software. When enterprises attempt to deploy such patchwork solutions, they often find limitations related to performance, scalability and reliability.
The requirements of individual IT shops still matter when it comes to dedupe. For example, the solutions built for large enterprises with massive amounts of data and rich, heterogeneous environments must offer high availability, data protection failover capabilities, scalability and large data repositories. Some of these deployments must accommodate multiple data center and remote offices, as well as consolidated data protection and disaster recovery resources that are cobbled together as the result of mergers and acquisitions. In these scenarios, deduplication is about simplifying processes and reducing costs.
However, some of those operations still need to integrate with physical tape, sometimes for regulatory reasons. For these users, there is no use echoing the cry, "Tape is dead!" For them, tape is very much alive and necessary, and dedupe tools that don't accommodate tape aren't as useful. Whatever the particular requirements of the data center, the deduplication offering needs to address them. Many of these deduplication features are desirable in smaller businesses, as well.
The start of a new year is a prime opportunity to reflect on recent accomplishments. Most businesses will enter 2013 having deployed simple deduplication that solves challenges that seemed intractable only a few years ago. Now, the opportunity is to move toward intelligent deduplication solutions that deliver dynamic data analysis and automatically assign appropriate dedupe policies that make the most sense for each IT environment.
FalconStor Software Inc. is a market leader in disk-based data protection. The company's mission is to transform traditional backup and disaster recovery into next-generation service-oriented data protection.
Read more about data center in Network World's Data Center section.
This story, "In 2013, deduplication smartens up" was originally published by Network World.