Poor test data costs industry billions per year

The longer it takes to find bugs the more expensive it is to fix them

Poor test data costs industry billions per year
Credit: Alberto G. (CC 2.0)

"Poor software quality costs $150+ billion per year in the US and over $500 billion worldwide" according to Capers Jones. Of those software quality issues, many come from poor test data quality. “According to NIST the average testing team spends between 30 and 50% of their time setting up test environments rather than on actual testing and the estimated number of projects with significant delays or quality issues is 74%."

Fortunately there is a solution to the test data quality problem using a technology called virtual data.  Like how virtual machines create virtual copies of physical computing resources, virtual data creates multiple lightweight virtual data copies from a single, full-size copy.

The problem with test data is that fully testing if code is ready for production requires a parity copy of production data, yet creating full parity copies of production data is often too onerous for most QA teams to manage. In order to more easily manage test data in development and QA, teams often use subsets of production data. Then before releasing final code to production the code is run on a full size copy of production data. This final testing might be done in the last weeks of a multi-month project. What typically happens is that this final production parity testing flushes out more bugs than can be fixed before the release date. Thus the release has to either be delayed or released with bugs.

subset data in dev and QA  
number of late stage bugs found  

The longer it takes to find bugs the more expensive it is to fix them as Barry Boehm pointed out his seminal book Software Engineering Economics (1981)

 
bugs more expense to fix with time  
One of the reasons that fixing bugs costs more the longer it takes to find them is that it it is harder for the developer to recall the code that needs to be fixed and the context for the code. On the other hand if a developer finds a bug immediately after it was written, it is easy to understand the context and much faster to debug the code.
 
What we actually want is for developers and testers to have access to parity copies of production data immediately.
 
production parity data in dev  
  
bugs found early  
Kyle Hailey

Now the question is "how do we supply development and QA with production parity data?"  Physically copying data from production is too time consuming, resource demanding and personnel heavy to be acceptable.

The solution is using virtual data.

Virtual data copies initially take up no space, only using space as changes are made to the copies. Since these copies take up no initial space they can be created in minutes. Virtual data copies rely on having an initial copy of the data that all the subsequent copies are based upon and all the subsequent copies share the duplicate data blocks across the copies. Such sharing of data blocks has been possible for 20 years with file system snapshot technology, but file system snapshots have rarely been used in this context due to problems coordinating DBAs, storage admins, system administrators and others in order to provision copies. Secondly, management of which snapshots have been taken, which can be purged, and which are being used by current data copies is burdensome. Virtual data eliminates these problems by encapsulating all the management and steps into an automated, self-contained technology.

Find out more by looking into some of the vendors such as Actifio, Delphix and Oracle with Snap Clone. 

This article is published as part of the IDG Contributor Network. Want to Join?

Related:
ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon