February 18, 2010, 4:22 PM — With data volumes erupting at companies big and small, stories of troublingly-long backup windows and unreliable tape restores are becoming all too common. With such issues plaguing Vertafore, a technology and information solutions company for the insurance industry, Chris Munoz, vice president of hosting services, knew a legacy rethink was in order -- you just can't mess around when you have more than 380 terabytes of data under management, after all. Now, with a data protection overhaul all wrapped up, this Bothell, Wash. company can instantly access and restore from any recovery point within a 14-day period via consistent snapshots. In this interview with contributing writer Beth Schultz, Munoz recaps the project highlights and reflects on ...
BACKING UP DATA THE OLD WAY: We have a large SQL database environment -- lots of small plus some very large databases -- with about two terabytes of data for a Windows host. In a SQL backup operation, we would dump the data to a different disk drive on the same Windows server and then at a later time back up to tape using host transfer to the tape drive. As we scaled up our applications, this whole scenario began putting a lot of strain on the Windows host itself, given it actually had to maintain customer operations with access to the database and stream data to the tape drives. We knew we needed something that would eliminate all of that and let us back up from the array level.
THE NEW, CONTINUOUS AND VIRTUAL, WAY: We implemented a series of FalconStor [Software] data protection devices in the critical path from the server to the storage and now we virtualize the disk from virtual arrays, and that's working for us. Instead of presenting small LUNs from the storage array to a server, we present large LUNs to the device, which carves the data up and presents it to a number of different hosts. Then the device itself, in band, also mirrors that data to a CDP [continuous data protection] appliance. The CDP software does the snapshotting, keeping track of our changes and replicating data from our primary data center in Texas to our data center in Georgia.
PUTTING DEVICES IN THE CRITICAL PATH: When you deploy something as an in-band solution, you definitely want to be cautious. You've got to make sure that the array, firmware and software levels on all your different devices have been tested and are in the vendors' support matrixes. In our case the storage vendors are HP and FalconStor. We have to make sure each time we roll out firmware updates they are supported by both vendors.
In other words, we might want to take a product to the latest level on the HP side, but that might not be a good thing because it could break something on the FalconStor side.
ON GOING TOTALLY IN-BAND: You don't want to go 100% in-band, direct-to-storage array, with no way to back out. We used a technology called service enablement that lets you take a LUN normally presented to the Windows host and instead present it through the FalconStor. FalconStor will keep track of that LUN and write all its header information to somewhere else. It won't destroy anything within the header of that LUN. So to the Windows box it looks like the same drive and the same signatures and everything. If you decide to then later back out, then just 'unpresent' the LUN from the FalconStor device and present it back to the host, and the host doesn't know the difference.
ACCOUNTING FOR OVERHEAD: We've learned that there's a lot more overhead on the device that's keeping track of the backups and the replication than on the in-band appliances. So where you have one or a pair of in-band appliances, based on the I/O going through it, you may need two sideband appliances for the backups and replication. There's a lot more read/write activity going on at that appliance because it's not only keeping track of the mirror and the snapshots, but it's also the appliance that replicates the data to the recovery site for us and it's the appliance to which we mount up snapshots from other boxes so we can back them up to tape.
UNDERSTANDING APPLICATION CHANGE RATE: Knowing what you need in the data protection environment depends on your applications' data change rates. We have many applications, some with quite low change rates and one with a very high data change rate. When we got into it, we realized that high change rate was putting a strain on the box. So we backed out using the service enabler and presented directly to the host until we made modifications and brought the service back in-band again.
TUNING APPS: We have been able to pass the change rate information to our development organization so it can tune the application. Prior to this, we didn't have visibility into change rate of an application and it isn't something we looked into -- the performance of the arrays was fine, and read/writes within milliseconds, so everything fell within the right parameters. But once we put this in place, we saw how much data actually is changing, and how that affects how much we replicate to the disaster recovery site.
UNEXPECTED FLEXIBILITY: This really opens up flexibility in backup and recovery operations but also in storage management from the array side. If you have one LUN on an array that you present to the host and you need to grow the LUN but you're out of space on that array, you can expand the LUN between two arrays using virtualization. And on the back end, you can use storage from multiple vendors and the host won't know the difference.
What do you know now that you wish you'd known then? Share your tales here or contact Beth Schultz, at email@example.com.