"Poor software quality costs $150+ billion per year in the US and over $500 billion worldwide" according to Capers Jones. Of those software quality issues, many come from poor test data quality. “According to NIST the average testing team spends between 30 and 50% of their time setting up test environments rather than on actual testing and the estimated number of projects with significant delays or quality issues is 74%."
Fortunately there is a solution to the test data quality problem using a technology called virtual data. Like how virtual machines create virtual copies of physical computing resources, virtual data creates multiple lightweight virtual data copies from a single, full-size copy.
The problem with test data is that fully testing if code is ready for production requires a parity copy of production data, yet creating full parity copies of production data is often too onerous for most QA teams to manage. In order to more easily manage test data in development and QA, teams often use subsets of production data. Then before releasing final code to production the code is run on a full size copy of production data. This final testing might be done in the last weeks of a multi-month project. What typically happens is that this final production parity testing flushes out more bugs than can be fixed before the release date. Thus the release has to either be delayed or released with bugs.
The longer it takes to find bugs the more expensive it is to fix them as Barry Boehm pointed out his seminal book Software Engineering Economics (1981)
Now the question is "how do we supply development and QA with production parity data?" Physically copying data from production is too time consuming, resource demanding and personnel heavy to be acceptable.
The solution is using virtual data.
Virtual data copies initially take up no space, only using space as changes are made to the copies. Since these copies take up no initial space they can be created in minutes. Virtual data copies rely on having an initial copy of the data that all the subsequent copies are based upon and all the subsequent copies share the duplicate data blocks across the copies. Such sharing of data blocks has been possible for 20 years with file system snapshot technology, but file system snapshots have rarely been used in this context due to problems coordinating DBAs, storage admins, system administrators and others in order to provision copies. Secondly, management of which snapshots have been taken, which can be purged, and which are being used by current data copies is burdensome. Virtual data eliminates these problems by encapsulating all the management and steps into an automated, self-contained technology.
Find out more by looking into some of the vendors such as Actifio, Delphix and Oracle with Snap Clone.
This article is published as part of the IDG Contributor Network. Want to Join?