Fix data before warehousing it

By Marty Moseley, Network World |  Storage, data management, data warehouse Add a new comment

Many data warehouse operators have attempted to implement Master Data Management to improve data quality, but most have focused on mastering data after transactions occur. This approach does little to improve quality because data is "fixed" after the fact. The best way to improve data quality is to move the process upstream of the data warehouse and before transactions are executed.

One goal of MDM is to prevent bad data from entering the ecosystem at all by enforcing data quality at the edges of the IT ecosystem. For example, when creating a sales order, the transactional system should guarantee that customer data on the order is not duplicated, and that address data are correct and current. The same thing could happen for the product data, prices, discounts, and payments on the order.

The system should also enforce business rules and policies and guarantee that the transaction is correct. Master data provides an enterprise-wide purview of data, whereas transactional systems only have insight into the data they contain. Since each source does not contain the sum total of master or reference data required to guarantee enterprise-wide data quality, they only provide a partial solution. This is why MDM can play a critical role.

If MDM is implemented using a real-time architecture, based on the principles of service-oriented architecture (SOA), then a service that enforces the business rules and policies governing each kind of master data can be made available to all transactional systems.

For example, a service that manages the uniqueness and quality of customer data can be created to ensure that a customer is created only once and made available to any system of record whenever a transaction occurs. This satisfies the need to enforce data quality at the edges of the ecosystem. The same applies for any master data including data about people and organizations of interest, products and related catalog and pricing data, locations of customers, products, and company assets, calendars of events.

These services, which can be called Master Data Services (MDS), guarantee the integrity of the most critical data within every transactional and data capture system in the organization. Each MDS can be an autonomous, decoupled service that is individually scalable and managed to ensure the quality of the domain of master data it manages.

One of the prime services provided by an Enterprise Service Bus (ESB) should be a MDS that serves as the real-time "single version of the truth" for a specific kind of master data. This type of service contributes significant business quality improvements because it provides data to the data warehouse and to all IT systems that consume or refer to master data.

Properly implemented, MDS dramatically improves data quality, which reduces the size, scope and complexity of data warehouse architectures. If each transaction is complete and accurate before it enters the data warehouse, the savings in terms of the work required to integrate it is greatly reduced. The result is a decrease in the cost of managing data quality and a significant reduction in the brittleness of the data warehouse data chain.

By rethinking the data warehouse data chain and getting transactional details from the message bus, the storage, replication and movement of data will be streamlined. The ultimate result is that scores of problem areas and risk will be eliminated, costs and complexity will be reduced and there will be fewer sources of errors.

Here are some recommendations to make it happen:

Organizations should not try to accomplish all of the above in one massive project. Chances are high that in attempting to implement most of these ideas at once, the MDM project will become too encumbered to produce results in a timely fashion. Practical experience has shown that an MDS can be implemented and in production in less than nine months.

Migrate in an evolutionary fashion, not a revolutionary one. Start by replacing one part of a data warehousing architecture with a more SOA-compliant design. This will allow the MDM system to produce results, without forcing change on the data warehouse.

Master only one domain at a time (customer data, product data, asset data, account data, portfolio data). Don't try to master everything or generally do too much at once. It is ultimately better to show a win in a production system in less time and then be asked to repeat the success in other master data domains than to overload a project and never produce results.

    Add a comment

    Post a comment using one of these accounts
    Or join now
    At least 6 characters

    Note: Comment will appear soon after you have activated your account.
    Obscene/spam comments will be removed and accounts suspended.
    The information you submit is subject to our Privacy Policy and Terms of Service.

    ITworld LIVE

    StorageWhite Papers & Webcasts

    White Paper

    ESG ~ HP StoreOnce: the Next Wave of Data Deduplication

    Leveraging deduplication in backup environments yields significant advantages. The cost savings in reducing disk capacity requirements change the economics of disk-based backup. For some organizations, it allows disk-based backup-and, importantly, recovery-to be extended to additional workloads in the environment. For others, deduplication makes it possible to introduce disk-based backup where it may not have been feasible before.

    White Paper

    Evaluator Group: Storage Federation - IT Without Limits (Analysis of HP Peer Motion with Storage Federation)

    As the role of IT increases within organizations, the need to move data when and where it is needed is critical to support emerging business requirements. This has become increasingly difficult due to the huge growth of data volumes. This white paper sponsored by HP + Intel evaluates a solution that aims to enable the movement of data without physical limitations. Read now and see how this could enable agility and efficiency.

    White Paper

    HP Converged Storage Sets the Stage for the Next Era of Computing

    Enterprise storage has undergone many changes in recent years - with converged storage and infrastructure 2.0 paving the way for reduced IT infrastructure costs and greater performance. This report discusses the latest trends that are setting the stage for the next era of computing. Learn about the new infrastructure and storage trends that are changing the way business storage works today.

    White Paper

    AppAssure vs Acronis

    In this study of data protection for environments with virtual and physical servers running Windows, openBench Labs tested AppAssure Backup and Replication software v 4.7 and Acronis Backup & Recovery 11. Both solutions utilize block-based technology to unify data protection operations.

    White Paper

    Guaranteeing 100% Backup Recovery

    The single biggest challenge for IT personnel involved in the data protection process is making sure that their backups are recoverable every time. Management and users won't remember the ninety-nine successful recoveries but they will always remember the one failure.

    See more White Papers | Webcasts

    Ask a question

    Ask a Question