Perfect Data and Other Data Quality Myths

August 25, 2009
59 Views
Loch-ness-monster-photo

A recent client experience reminds me what I’ve always said about data quality: it isn’t the same as data perfection. After all, how could it be? A lot of people think that correcting data is a post-facto activity based on opinion and anecdotal problems. But it should be an entrenched process.

One drop of gasoline can pollute a thousand gallons of pure water. But it’s not the same with data. On the other hand the FDA says that a single worm found in 10,000 pounds of cereal is perfectly fine. (Jill says this is “apocryphal,” but you get my point.)

People forget that the definition of data quality is data that’s fit for purpose. It conforms to requirements. You only have to look back at the work of Philip Crosby and W. Edwards Demming to understand that quality is about conformance to requirements. We need to understand the variance between the data as it exists and its acceptability, not its perfection.

The reason data quality gets so much attention is when bad data gets in the way of getting the job done. If I want to send an e-mail to 10,000 customers and one customer’s zip code is unknown, it doesn’t prevent me from contacting the other 9999 customers. That can amount to what in .

A recent client experience reminds me what I’ve always said about data quality: it isn’t the same as data perfection. After all, how could it be? A lot of people think that correcting data is a post-facto activity based on opinion and anecdotal problems. But it should be an entrenched process.

One drop of gasoline can pollute a thousand gallons of pure water. But it’s not the same with data. On the other hand the FDA says that a single worm found in 10,000 pounds of cereal is perfectly fine. (Jill says this is “apocryphal,” but you get my point.)

People forget that the definition of data quality is data that’s fit for purpose. It conforms to requirements. You only have to look back at the work of Philip Crosby and W. Edwards Demming to understand that quality is about conformance to requirements. We need to understand the variance between the data as it exists and its acceptability, not its perfection.

The reason data quality gets so much attention is when bad data gets in the way of getting the job done. If I want to send an e-mail to 10,000 customers and one customer’s zip code is unknown, it doesn’t prevent me from contacting the other 9999 customers. That can amount to what in any CMO’s estimation is a very successful marketing campaign. The question should be: What data helps us get the job done?

Our client is a regional bank that has retained Baseline to work with its call center staff. Customer service reps (CSRs) have been frustrated that they get multiple records for the same customer. They had to jump through hoops to find the right data, often while the customer waited on the phone, or on-line. The problem wasn’t that the data was “bad”—it was that the CSRs could only use the customer’s phone number to look up the record. If the phone number was incorrect, the CSR can’t do her job. And as a result, her compensation suffers. So data quality is very important to her. And to the bank at large.

Users are all too accustomed to complaining about data. The goal of data quality should be continuous improvement, ensuring a process is available to fix data when it’s broken. If you want to address data quality, focus energy on the repair process. As long as your business is changing—and I hope it is—its data will continue to change. Data requirements, measurements, and the reference points for acceptability will keep changing too. If you’re involved in a data quality program, think of it as job security.

Link to original post

You may be interested

How SAP Hana is Driving Big Data Startups
Big Data
298 shares2,905 views
Big Data
298 shares2,905 views

How SAP Hana is Driving Big Data Startups

Ryan Kh - July 20, 2017

The first version of SAP Hana was released in 2010, before Hadoop and other big data extraction tools were introduced.…

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion
Data Management
40 views
Data Management
40 views

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion

Manish Bhickta - July 20, 2017

Physical Data destruction techniques are efficient enough to destroy data, but they can never be considered eco-friendly. On the other…

10 Simple Rules for Creating a Good Data Management Plan
Data Management
69 shares622 views
Data Management
69 shares622 views

10 Simple Rules for Creating a Good Data Management Plan

GloriaKopp - July 20, 2017

Part of business planning is arranging how data will be used in the development of a project. This is why…