By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics in sports industry
    Here’s How Data Analytics In Sports Is Changing The Game
    6 Min Read
    data analytics on nursing career
    Advances in Data Analytics Are Rapidly Transforming Nursing
    8 Min Read
    data analytics reveals the benefits of MBA
    Data Analytics Technology Proves Benefits of an MBA
    9 Min Read
    data-driven image seo
    Data Analytics Helps Marketers Substantially Boost Image SEO
    8 Min Read
    construction analytics
    5 Benefits of Analytics to Manage Commercial Construction
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Hyperactive Data Quality (Second Edition)
Share
Notification Show More
Latest News
data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security
ai in software development
3 AI-Based Strategies to Develop Software in Uncertain Times
Software
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Uncategorized > Hyperactive Data Quality (Second Edition)
Uncategorized

Hyperactive Data Quality (Second Edition)

JimHarris
Last updated: 2009/08/15 at 12:39 AM
JimHarris
10 Min Read
SHARE

In the first edition of Hyperactive Data Quality, I discussed reactive and proactive approaches using the data quality lake analogy from Thomas Redman’s excellent book Data Driven: Profiting from Your Most Important Business Asset:

Contents
Reactive Data QualityProactive Data QualityProactive Data Quality > Reactive Data QualityAn Arrow Thickly Smeared With Poison“Flight to Data Quality”Hyperactive Data QualityRelated Posts

“…a lake represents a database and the water therein the data. The stream, which adds new water, is akin to a business process that creates new data and adds them to the database. The lake… is polluted, just as the data are dirty. Two factories pollute the lake. Likewise, flaws in the business process are creating errors…

One way to address the dirty lake water is to clean it up… by running the water through filters, passing it through specially designed settling tanks, and using chemicals to kill bacteria and adjust pH.

The alternative is to reduce the pollutant at the point source – the factories.

More Read

analyzing big data for its quality and value

Use this Strategic Approach to Maximize Your Data’s Value

7 Data Lineage Tool Tips For Preventing Human Error in Data Processing
Preserving Data Quality is Critical for Leveraging Analytics with Amazon PPC
Quality Control Tips for Data Collection with Drone Surveying
3 Huge Reasons that Data Integrity is Absolutely Essential

The contrast between the two approaches is stark. In the first, the focus is on the lake; in the second, it is on the stream. So too with data. Finding and fixing errors focuses on the database and data that have already been created. Preventing errors focuses on the business processes and future data.” … 

In the first edition of Hyperactive Data Quality, I discussed reactive and proactive approaches using the data quality lake analogy from Thomas Redman’s excellent book Data Driven: Profiting from Your Most Important Business Asset:

“…a lake represents a database and the water therein the data. The stream, which adds new water, is akin to a business process that creates new data and adds them to the database. The lake… is polluted, just as the data are dirty. Two factories pollute the lake. Likewise, flaws in the business process are creating errors…

One way to address the dirty lake water is to clean it up… by running the water through filters, passing it through specially designed settling tanks, and using chemicals to kill bacteria and adjust pH.

The alternative is to reduce the pollutant at the point source – the factories.

The contrast between the two approaches is stark. In the first, the focus is on the lake; in the second, it is on the stream. So too with data. Finding and fixing errors focuses on the database and data that have already been created. Preventing errors focuses on the business processes and future data.”

Reactive Data Quality

Reactive Data Quality (i.e. “cleaning the lake” in Redman’s analogy) focuses entirely on finding and fixing the problems with existing data after it has been extracted from its sources. 

An obsessive-compulsive quest to find and fix every data quality problem is a laudable but ultimately unachievable pursuit (even for expert “lake cleaners”). Data quality problems can be very insidious and even the best “lake cleaning” process will still produce exceptions. Your process should be designed to identify and report exceptions when they occur. In fact, as a best practice, you should also include the ability to suspend incoming data that contain exceptions for manual review and correction.

Proactive Data Quality

Proactive Data Quality focuses on preventing errors at the sources where data is entered or received, and before it is extracted for use by downstream applications (i.e. “enters the lake” in Redman’s analogy). 

Redman describes the benefits of proactive data quality with what he calls the Rule of Ten:

“It costs ten times as much to complete a unit of work when the input data are defective (i.e., late, incorrect, missing, etc.) as it does when the input data are perfect.”

Proactive data quality advocates reevaluating business processes that create data, implementing improved controls on data entry screens and web forms, enforcing the data quality clause (you have one, right?) of your service level agreements with external data providers, and understanding the information needs of your consumers before delivering enterprise data for their use.

Proactive Data Quality > Reactive Data Quality

Proactive data quality is clearly the superior approach.  Although it is impossible to truly prevent every problem before it happens, the more control that can be enforced where data originates, the better the overall quality will be for enterprise information. 

Reactive data quality essentially treats the symptoms without curing the disease. As Redman explains: “…the problem with being a good lake cleaner is that life never gets better… it gets worse as more data…conspire to mean there is more work every day.”

So why do the vast majority of data quality initiatives use a reactive approach?

An Arrow Thickly Smeared With Poison

In Buddhism, there is a famous parable:

A man was shot with an arrow thickly smeared with poison.  His friends wanted to get a doctor to heal him, but the man objected by saying:

“I will neither allow this arrow to be pulled out nor accept any medical treatment until I know the name of the man who wounded me, whether he was a nobleman or a soldier or a merchant or a farmer or a lowly peasant, whether he was tall or short or of average height, whether he used a long bow or a crossbow, and whether the arrow that wounded me was hoof-tipped or curved or barbed.” 

While his friends went off in a frantic search for these answers, the man slowly, and painfully, dies.

“Flight to Data Quality”

In economics, the term “flight to quality” describes the aftermath of a financial crisis (e.g., a stock market crash) when people become highly risk-averse and move their money into safer, more reliable investments.

A similar “flight to data quality” can occur in the aftermath of an event when poor data quality negatively impacted decision-critical enterprise information. Some examples include a customer service nightmare, a regulatory compliance failure, or a financial reporting scandal. 

Driven by a business triage for critical data problems, reactive data cleansing is purposefully chosen over proactive defect prevention. The priority is finding and fixing the near-term problems rather than worrying about the long-term consequences of not identifying the root cause and implementing process improvements that would prevent it from happening again.

The enterprise has been shot with an arrow thickly smeared with poison – poor data quality. Now is not the time to point out that the enterprise has actually shot itself by failing to have proactive measures in place. 

Reactive data quality only treats the symptoms. However, during triage, the priority is to stabilize the patient. A cure for the underlying condition is worthless if the patient dies before it can be administered.

Hyperactive Data Quality

Proactive data quality is the best practice. Root cause analysis, business process improvement, and defect prevention will always be more effective than the endlessly vicious cycle of reactive data cleansing. 

A data governance framework is necessary for proactive data quality to be successful. Patience and understanding are also necessary. Proactive data quality requires a strategic organizational transformation that will not happen easily or quickly. 

Even when not facing an immediate crisis, the reality is that reactive data quality will occasionally be a necessary evil that is used to correct today’s problems while proactive data quality is busy trying to prevent tomorrow’s problems.

Just like any complex problem, data quality has no fast and easy solution. Fundamentally, a hybrid discipline is required that combines proactive and reactive aspects into an approach that I refer to as Hyperactive Data Quality, which will make the responsibility for managing data quality a daily activity for everyone in your organization.

Please share your thoughts and experiences.

Related Posts

Hyperactive Data Quality (First Edition)

The General Theory of Data Quality

Link to original post

TAGGED: data quality
JimHarris August 15, 2009
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

analyzing big data for its quality and value
Big Data

Use this Strategic Approach to Maximize Your Data’s Value

6 Min Read
data lineage tool
Big Data

7 Data Lineage Tool Tips For Preventing Human Error in Data Processing

6 Min Read
data quality and role of analytics
Data Quality

Preserving Data Quality is Critical for Leveraging Analytics with Amazon PPC

8 Min Read
data collection with drone use
Data Collection

Quality Control Tips for Data Collection with Drone Surveying

9 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?