Automate Data Remediation to Find Dirty Data Before Your Customers Do

April 6, 2011

Washing_computer I teach a graduate course on data warehousing at Northeastern University in Boston.

Washing_computer I teach a graduate course on data warehousing at Northeastern University in Boston. Unlike the people I teach at clients’ sites or at conferences such as TDWI, most of my students have not actually worked in IT yet, never mind had hands-on experience with data warehousing and business intelligence.

This means I often have to go back to the basics. If I mention something like data remediation or rework, I’m sure to be asked what it is, why it matters, what causes it and what it has to do with enterprise data management (EDM).

What is data remediation?

The “what it is” is the easy part: business needs accurate information and that often requires going back to rework and fix data to eliminate data-quality issues. Data needs be checked for completeness, conformity, consistency, duplicates, integrity, and accuracy.

A 2007 survey of more than 1,000 middle managers of large companies in the United States and United Kingdom conducted by Accenture revealed that “Middle managers spend more than a quarter of their time searching for information necessary to their jobs, and when they do find it, it is often wrong.” (The emphasis is mine.) What do you want to bet that this hasn’t changed since then?

Why is data remediation important?

The “why it matters” is tied into all the reasons why data quality matters. For my grad school students, I’d compare it to Six Sigma, something they’ve likely encountered in their management classes. The earlier in a process that you find defects the easier and less disruptive it is to fix them. YOU want to be the one to find the problems, not your customer. With auto manufacturing, defects found in design or manufacturing can be handled internally. But defects found by the customer mean expensive, embarrassing recalls that end up on the evening news. And even worse for the food or pharmaceutical industry, defects can be deadly to their customers.

It’s the same with data – if no one internally identifies quality problems and they then are found by the customer, you find yourself faced with a fire drill to fix them. You’re scrambling, your CIO is livid, and your image is blown along with the reputation of your CPM, BI or DW program.

That’s expensive and embarrassing, but what about the bad data no one finds that ends up being used to make business decisions? That could be damaging to your business.

External pressures like government regulations can put a lot of pressure on finance departments to start data remediation projects. The business risks of not doing it include all the problems associated with poor data quality. But if they try to do it manually (which many do, surprisingly) they’re apt to miss a lot. It takes longer, requires more time, and introduces complexities. In the scheme of EDM, data remediation should always be automated (e.g., with an ETL product) to improve the accuracy and overall business operations.

Enterprise Data Management (EDM) is the proactive approach to ensuring data is transformed into consistent, accurate and timely business information. And if there are data issues, you can discover them with auditing and data lineage rather than scrambling through macros in spreadsheets. A patch-work of stopgap measures is costly and you are not likely to solve data quality issues in a reactive manner, i.e. data remediation.

Let’s face it; you can’t fix a problem that you can’t find. And, it’s worth noting, that with data quality it’s finding the problems is paramount. You might not have to fix everything you find, but you sure better find it!

Data quality isn’t so much of a problem when you know where your problems lie. The time has come for both business and IT to address these issues through an EDM program.