Why you should leverage deduplication across backup and archiving applications
The challenges of information growth are well documented today. In fact, we are at a point where information costs are automatically factored into in all areas of a typical IT budget. Copies of all types of information are everywhere. While organizations express the desire to "do more with less" and maximize efficiencies, storage requirements continue to grow across the data center and within remote offices. From primary storage for applications and virtual machines, to the storage required for backup and disaster recovery, fragmented islands of information develop - often each with its own separate tools, operations and service requirements.
Needless to say, this is creating a fundamental shift in the way organizations manage information. As organizations realize that yesterday’s tape-based approach is proving too cumbersome in today’s world, more companies are turning to disk and deduplication technologies in order to facilitate faster backups, reduce primary storage, and replace tape shipping for better disaster recovery.
In the real world, the adoption of disk and deduplication has been evolutionary, but not necessarily revolutionary. From virtual tape libraries and disk staging to target-based dedupe appliances, many organizations have learned that adding disk and deduplication to the current tape backup environment can greatly improve the overall performance for the top 10 percent of their tier-one workloads. But how should organizations handle the bigger challenges of information management across the vast amounts of structured and unstructured data that exists across their IT infrastructure?
By only applying deduplication at the tail-end of the process, multiple copies of the exact same information will continue to exist and grow on primary storage (e.g. email and file servers). This storage problem causes serious capacity management challenges anywhere large amounts of information exists. The bottom line is, if it takes up space, it costs money.
Instead of applying deduplication in one place, users should consider deduplication everywhere. In order to maximize efficiency, consider the possibility of leveraging deduplication in an integrated fashion across backup and archiving applications – as well as storage. By deduplicating data right at the source of creation, organizations can focus on tackling the real problem of data reduction everywhere information resides.
For example, one commercial construction company faced the growing challenge of how to manage business critical information on 180 servers located at 45 remote sites around the United States. Backups conducted in remote locations were typically 100GB to 200GB per server and it took nearly 12 hours for each server to be replicated back to a central data center. By moving to disk-based deduplication across their remote offices, the company saw replicated data reduce to 10GB to 15GB and the backup window shrunk as they were able to reduce the amount redundant of data being moved by over 90 percent.
The question is no longer whether or not to dedupe, but how deduplication should be deployed and what is best for the IT environment?
In reality, deduplication is not about a single technology, it’s about selecting the right approach. This is why a number of storage vendors are working together in order to give organizations much greater flexibility in their dedupe options. By providing an interface between archiving and backup software and advanced disk-based storage appliances, organizations can now leverage an integrated platform that not only supports both source and target-based dedupe, but is also easy to manage.
Integrated deduplication within backup and archiving applications can provide unified data protection, from the remote office/branch office to the data center, and including both disk and tape, physical and virtual environments. From remote offices to the data center, these next-generation software solutions offer comprehensive protection and a single console for the management of all backup and recovery operations. Centralized management provides integrated data archiving, migration, and retention capabilities that address regulations for governance and compliance. Additionally, advanced reporting on backup and recovery operations enables service-level management of all protected data in the enterprise.
Organizations often choose this approach because deduplication is integrated into their existing backup and recovery application, giving them the ability to build a more customized solution. This also enables them to take advantage of valuable advanced data protection features such as support for storage lifecycle policies, media server load balancing, shared disk pools, and more.
Organizations may also opt for integrated source/client deduplication since this approach supports in-line deduplication of backup streams, which may be more likely to pinpoint duplicate content. The easy deployment of software-based deduplication also makes for a more simple deployment in remote locations. Furthermore, deduplicating as close to the source as possible will reduce the storage footprint. Because data will be deduplicated before transmitting over WAN connections, it will free- up critical network resources.
Considerations for Choosing the Right Approach
Deduplicating data everywhere can significantly help organizations benefit from robust management and tight integration with the backup/recovery platform without being locked into a specific hardware vendor or software solution. Organizations should consider a number of things in order to make the right decision around deduplication.
1. Think long-term about your dedupe strategy. In other words, don’t be “reactive” with your disk and dedupe purchases. Most projects fail by taking a temporary band-aid approach and fixing only one part of the storage problem rather than addressing the whole problem. Instead, consider how to get the most out of a dedupe purchase to address challenges for today and the future. How much data will I have in five years? Could I eliminate all trucking of tapes? Can I consolidate my operations from multiple sites to a few core datacenters? By asking the right questions, users will be better suited to think holistically about dedupe needs across primary, secondary and archive storage.
2. Consider the entire architecture to achieve best results. Deduplication isn’t all about getting the most high-performance (and high cost) solution on the market. Not all backups have the same requirements, so why should they all be deduped the same way? Users should look for ways to “right size” and “right price” the dedupe strategy. This can be accomplished by leveraging source dedupe and archiving to reduce data at the beginning to be as efficient as possible, and to reduce the price of deduplication. You can then leverage high-performance hardware for tier-one workloads and leverage replicated disk solutions to challenge the cost of high-end disaster recovery.
3. Archiving may be the most important tool in the dedupe arsenal. Primary storage is the fastest growing area where data redundancy problems exist (email, file systems, etc). As end-users share more information than ever before, it creates the need for a common platform for long-term retention and intelligent information management. With software based archiving, organizations can achieve an 80 percent reduction in primary application storage. With less data to protect, backups will be faster and application performance increases with more options for server consolidations.
For most organizations, the best solution is to move dedupe closer to information sources by integrating into backup, whether it is software or hardware. The result is that organizations can stop buying storage and re-use what they have, while also recovering data faster, whether it is an individual file, or an entire server. This approach also allows organizations to increase their return on virtualization by consolidating storage as well as servers. By working with vendors that are ‘open’ and provide a choice in deduplication approaches, organizations will be able to leverage multiple deduplication technologies and deploy deduplication everywhere.
Matt Kixmoeller is Vice President, Enterprise Product Management, Information Management Group, Symantec Corp.