Tuesday, October 13, 2009

Implementing an Analytical MDM


Last week I was at the DataFlux IDEAS conference. Without a doubt, the biggest attraction from a product perspective was Project Unity, which is a joint effort between DataFlux and its parent company SAS. Unity is a next-generation enterprise data management platform encompassing data quality, data integration and master data management (MDM). As a DataFlux user for years, I can certainly say that Unity will vastly improve existing functionality, add outstanding new features, and ameliorate already ease of use interface.

I also had a chance to take a look into their existing qMDM for Customer Data product. qMDM helps you integrate multiple customer data sources into a single structure. Within the resulting centralized repository, pre-defined cleansing/standardization/enrichment rules can be customized as needed to provide the proper level of consolidation. Finally, cluster of customer records can be mapped properly into hierarchies via a web-driven interface for a unified view of customer information.

Too bad qMDM wasn't available a few years ago, when I was working on an Analytical MDM implementation for Customer Data. It would have certainly saved me a lot of time and grief. I did use DataFlux, but had to write logic that is now available out-of-the-box with qMDM.

This posting is not about describing qMDM. The short presentation I saw doesn't qualify me for that. However, I will describe in high level terms what we implemented a few years ago. It should be generic enough that you can adapt to your particular situation, whether you develop everything in-house or use a vendor tool. Remember that even if you use a tool, from DataFlux or not, quite a bit of customization may be required depending on your business needs.


Our objective with this project was to improve Business Intelligence and Analytics as it relates to customer. Needless to say, customer information was the primary driver. As such, we needed a unified view of the customer information throughout the entire organization to achieve the so elusive 360-degree view of the customer.

Customer data exists in multiple system across the enterprise. So does the transaction data associated with them. We had to bring both customer and transaction information into a single repository that we could use. By cleansing, standardizing, enriching and consolidating customer data, we could better make sense of the associated transaction information and use the results for predictive analytics and reporting.

For example, understanding that a particular customer on a given industry sector had purchased a particular hardware configuration with combined storage, software, and services could lead to cross-sell opportunities for other customers in the same industry sector. There wasn't a single system holding all the transaction information associated to a customer, meaning we had to bring data together from multiple systems to make that determination. Furthermore, even when data was all stored on a particular repository, there was a fair amount of customer data duplication, making it difficult to understand all the associations.

In essence, fragmentation and duplication was preventing us from realizing the opportunity we had to really use our data as an asset.


There are 4 main logical components in this project:
1. Data Extract Process
2. Customer Data Cleansing, Standardization, Enrichment and Consolidation
3. Customer Hierarchy Mapping
4. Analytics and Reporting

Activities performed by each component is well defined and can be quite complex. Multiple tools are used, and most of the process is automated. However, some steps require human intervention, and reports and analytics are constantly evolving.

Here is a list of tools/technologies used: Oracle RDMS, Oracle PL/SQL, DataFlux, D&B, Java, CORDA, Oracle ODM, SAS, BRIO. In my diagrams, I have added a logo indicating which technology is used for each activity.

Data Extract Process

Before we can analyze our data, we need to bring it together. This is the first step on a Customer Data Integration (CDI) project, where multiple sources containing disparate information are queried and the relevant information loaded into a single repository.

I like separating the data into two parts: the Customer Master Data or Customer Identity, and the Transaction Data. Master Data is the persistent, non-transactional data defining a business entity for which there should be one consistent and understood view across the organization. That is one of the things we are achieving with this project. By having the identity separated from the transaction, we can use tools and techniques to consolidate it without losing the keys that map the data back to the transaction piece.

In any event, in this first component, we use Oracle stored procedures to extract/transform/load the disparate data from multiple sources into a common structure. For the Customer Identity Data tables, any system specific idiosyncrasies are eliminated during this step. Therefore, following steps are applied independent of the source.

Customer Data Cleansing, Standardization, Enrichment and Consolidation

The ultimate goal in this step is to eliminate duplicates. But the data is dirty and even though they conform to a common structure at this point, they don't necessarily conform to the same standard. Address lines could be mixed, customer names non-complaint, missing information, etc.

We use D&B to enrich data with DUNS number, and standard company and address information. We also use DataFlux to standardize company information, cleanse data fields, right-field address information, etc.

Once data is cleansed and standardized, we use DataFlux and fuzzy matching to consolidate similar records into clusters. This is fully automated, and as such, we want to minimize risk. Our clustering is conservative, purposely leading to false negatives, but avoiding false positives.

Customer Hierarchy Mapping

After data is consolidated, we have a much lower number of records to map into hierarchies. To be precise, the number of records is the same, but since they are in clusters, you only need to map the cluster, and every record under it is automatically mapped.

Customer hierarchy allows for rollups, facilitating the interpretation and reporting of transactions associated to a group of customer records.

Some records are automatically mapped by DataFlux by means of getting grouped into an already mapped cluster. A group of hierarchy managers use a java web-driven application to manually map records into hierarchies.

Analytics and Reporting

This is where everything comes together. We now have hierarchical and consolidated Customer Master Data. We also have transaction information. We can use Analytics and Data Mining tools to combine the two for Business Intelligence. Dashboards using CORDA and ad-hoc reports with BRIO complement the suite.


Pretty easy, isn't it? This is not for the faint-of-heart, but once in operation can reap great benefits. Our “clients” (sales and marketing teams) are extremely satisfied with the results this implementation has provided.

Remember also that the data at the source is dynamic. Our process supports new records as well as changes to existing records. It gets quite complex because you can have an already mapped record modified at the source. There are different implications if the record is the survivor in its cluster or not, if it is mapped to a hierarchy or not, etc. But that deserves a total new posting.

1 comment:

  1. Hello,
    We facilitate the provision of independent analysis to support expert testimony, regulatory or legislative engagements. Frequently, this work includes economic, financial and statistical studies of varying data analysis, technical and http://www.potentiamed.com.