Cloud migration: Getting data to the cloud is the elephant in the room

The main hurdle to migrating enterprise applications to the cloud is getting the data to the cloud. Here's how to do it even when that data runs into multiple terabytes.

Elephant on businessman's back
Credit: dreamstime.com

There is plenty of hype about how great the cloud is but no one is addressing the elephant in the room: How do you move an enterprise application to the cloud? It's one thing if you have a Wordpress site, which is sort of the "hello world" of the cloud. It's quite another thing if you are moving an Oracle ERP system to the cloud.

Cloud computing offers compelling benefits but migrating to the cloud poses a huge obstacle due to the hurdles of moving data to the cloud, cloud data security and stability and performance in the cloud. All of these pain points can be overcome or mitigated by a technology called copy data virtualization.

IT leaders are concerned that cloud providers oversimplify their solutions and fail to appreciate both the complexity of their potential customers’ applications and their fears that migration could fail. Despite the seemingly insurmountable obstacles that enterprise cloud migrations present, you would be hard-pressed to find a business reliant on its data systems that did not recognize and desire the tactical benefits of the cloud. Thus, the will to migrate.

The main hurdle to migrating to the cloud is getting the data to the cloud. How can we migrate an application to the cloud when data used by that application often runs into multiple terabytes? Trying to pump multiple terabytes from in-house IT over the internet to a cloud provider can take days and saturate networks. Often the most expedient way to get large data sets into the cloud is by shipping the physical media to the cloud provider. Physical shipping is hardly an agile approach in today's age of speed, agility and innovation. The task is further complicated by the ongoing need to propagate changes from in-house databases to refresh databases in the cloud with the newest most accurate data for use in development and QA. On top of that, these applications and databases have sensitive data such as credit cards, social security numbers and health care information that needs to be masked before propagating the refreshed data to the cloud. Luckily there is a solution and that solution is called copy data virtualization.

Copy data virtualization

Copy data virtualization (CDV)  is a technology that leverages data masking and thin cloning along with compression and change tracking. By tracking the changes to all data blocks on a storage system and sharing any duplicate data blocks between data copies, CDV enables copies of data to be made almost instantly because there is no copying of data. A new copy of data is simply a new set of pointers to existing data. Copies of large databases can be made in minutes for almost no storage. Once the virtual copy of data has been made, any changes to that new copy are tracked separately from the original and kept private to the copy. Along with the massive data savings of sharing duplicate blocks, data virtualization incorporates data compression enabling even greater storage savings.

How does data virtualization software help data migrations to the cloud? Data virtualization software has three key components that facilitate and enable cloud migration. Those crucial components are:

  1. Single copy of all duplicate data blocks
  2. Compression of unique data blocks
  3. Streaming replication of changed data blocks

The first point, sharing all duplicate data, is the most impactful point. For example, if there is a source database of 9 TB and 10 copies of it for QA and development, then migration to the cloud would normally require coping 10 databases totaling 90 TB. With CDV we would instead just move a single compressed copy of the database along with any changes unique to each of the copies. The total movement would typically be around 10 TB but with compression this goes down to about 3TB. Thus total data copy requirement would go down from an original 90TB to only 3TB.

Now 3 TB can still be a lot of data to move into the cloud depending on the network bandwidth. This is where the third point of streaming replication comes into play. With streaming replication, CDV can take place on premise and then the data can be replicated to the cloud. CDV software typically integrates replication. Replication runs in a managed manner where transfers can be limited to hours of low usage so as to avoid network congestion. The transfer can run automatically over days. Once a first full replication is accomplished, then only change blocks that happen on premises are need to be replicated to the cloud and these changes are in turn compressed keep network bandwidth low.

Data virtualization software provides an elegant solution to the above challenges for customers who need to quickly, efficiently and cost-effectively migrate to the cloud. Best of all with replication it is easy to both migrate to the cloud and migrate back in house, mitigating concerns about being locked into cloud should issues come up such as performance or missed SLAs. Finally the leading CDV solutions all have masking options which can address data security concerns.

This article is published as part of the IDG Contributor Network. Want to Join?

Related:
ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon