Data Error Inequality

March 16, 2011
152 Views

On his excellent data quality and management blog, my friend Henrik Liliendahl recently wrote an excellent post entitled “Good, fast, cheap – pick any two.” In his post, Henrik discusses the well-worn trade-off between among things well, quickly, and for very little money. To quote Henrik:

Some data, especially those we call master data, is used for multiple purposes within an organization. Therefore some kind of real world alignment is often used as a fast track to improving data quality where you don’t spend time analyzing how data may fit multiple purposes at the same time in your organization. Real world alignment also may fulfill future requirements regardless of the current purposes of use.

As usual, Henrik is absolutely right and many consultants have heard the axiom on which his post is based.

A Very Simple Model

In my day, I have seen people grossly overreact to data quality or conversion issues. Generally speaking, I have seen three types of errors. Note that this is a very simple model and cannot possibly account for every type of scenario and potentially pernicious downstream effect:

Type of ErrorExampleShould You Freak Out?
Master Record ErrorCustomer or EmployeeProbably
Important Characteristic Field ErrorEmployee AddressKind of
Information or “Nice to Have” Field ErrorCustomer backup contact numberNo

The bottom line from the table above is that not all data issues are created equal. Brass tacks: missing or erroneous information in a customer, vendor, or employee master record is not the same as an “information-only” field that drives nothing. (For more information on this, see the MIKE2.0 Master Data Management Offering.)

Of course, not everyone understands this. I can think of one woman (call her Dorothy here) with whom I worked on an enormous ERP project. To her, all errors were major issues. For example, I remember when the consulting team of which I was a part ran conversion programs attempting to load more than one million historical records into the new payroll system. Something like 12,000 records were flagged as potential issues.

Do the math. That’s nearly a 99 percent accuracy rate–and the data was much, much better coming out of the legacy system based upon a very sophisticated ETL tool created to minimize those errors. Further, the vast majority of those errors were “information only” soft edits from the vendor’s conversion program. That is, they weren’t really errors.

Of course, Dorothy chose to focus on (in her view) the enormity of the 12,000 number. She did not want to hear explanations. While this irritated me (given how how the team had been working to cleanse the client’s legacy data), I wasn’t all that surprised. Dorothy knew nothing about data management and this was her first experience managing a project anywhere near this scope.

Simon Says

Fight the urge to treat all errors and issues as equal. They are not. Take the time to understand the nuances of your data, your information management project’s constraints, and the links among different systems, tables, and applications. You’ll find that your team will respect you more if you invest a few minutes in separating major issues from non-issues.

Feedback

What say you?