Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    payment methods
    How Data Analytics Is Transforming eCommerce Payments
    10 Min Read
    data analytics for pharmacy trends
    How Data Analytics Is Tracking Trends in the Pharmacy Industry
    5 Min Read
    car expense data analytics
    Data Analytics for Smarter Vehicle Expense Management
    10 Min Read
    image fx (60)
    Data Analytics Driving the Modern E-commerce Warehouse
    13 Min Read
    big data analytics in transporation
    Turning Data Into Decisions: How Analytics Improves Transportation Strategy
    3 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: The Data Quality Goldilocks Zone
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Uncategorized > The Data Quality Goldilocks Zone
Uncategorized

The Data Quality Goldilocks Zone

JimHarris
JimHarris
6 Min Read
SHARE

In astronomy, the habitable region of space where stellar conditions are favorable for life as it is found on Earth is referred to as the “Goldilocks Zone” because such a region of space is neither too close to the sun (making it too hot) nor too far away from the sun (making it too cold), but is “just right.” 

In data quality, there is also a Goldilocks Zone, which is the habitable region of time when project conditions are favorable for success. 

Too many projects fail because of lofty expectations, unmanaged scope creep, and the unrealistic perspective that data quality problems can be permanently “fixed” as opposed to needing eternal vigilance.  In order to be successful, projects must always be understood as an iterative process.  Return on investment (ROI) will be achieved by targeting well defined objectives that can deliver small incremental returns that will build momentum to larger success over time.  

Data quality projects are easy to get started, even easier to end in failure, and often lack the decency of at least failing quickly.  Just like any complex problem, there is no fast and easy solution for data quality. 

More Read

What Should I Say About Social Search?
Actian DataFlow, the Little Hadoop Engine That Could, But Probably Won’t
3 Ways to Get Your Data Into Shape [INFOGRAPHIC]
Google Treasure Hunt 2008 – and some Google App Engine
Supporting Work in the Information Economy

Projects …

In astronomy, the habitable region of space where stellar conditions are favorable for life as it is found on Earth is referred to as the “Goldilocks Zone” because such a region of space is neither too close to the sun (making it too hot) nor too far away from the sun (making it too cold), but is “just right.” 

In data quality, there is also a Goldilocks Zone, which is the habitable region of time when project conditions are favorable for success. 

Too many projects fail because of lofty expectations, unmanaged scope creep, and the unrealistic perspective that data quality problems can be permanently “fixed” as opposed to needing eternal vigilance.  In order to be successful, projects must always be understood as an iterative process.  Return on investment (ROI) will be achieved by targeting well defined objectives that can deliver small incremental returns that will build momentum to larger success over time.  

Data quality projects are easy to get started, even easier to end in failure, and often lack the decency of at least failing quickly.  Just like any complex problem, there is no fast and easy solution for data quality. 

Projects are launched to understand and remediate the poor data quality that is negatively impacting decision critical enterprise information.  Data-driven problems require data-driven solutions.  At that point in the project lifecycle when the team must decide if the efforts of the current iteration are ready for implementation, they are dealing with the Data Quality Goldilocks Zone, which instead of being measured by proximity to the sun, is measured by proximity to full data remediation, otherwise known as perfection. 

The obvious problem is that perfection is impossible.  An obsessive-compulsive quest to find and fix every data quality problem is a laudable pursuit but ultimately a self-defeating cause.  Data quality problems can be very insidious and even the best data remediation process will still produce exceptions.  As a best practice, your process should be designed to identify and report exceptions when they occur.  In fact, many implementations will include logic to provide the ability to suspend exceptions for manual review and correction. 

Although all of this is easy to accept in theory, it is notoriously difficult to accept in practice. 

For example, let’s imagine that your project is processing one billion records and that exhaustive analysis has determined that the results are correct 99.99999% of the time, meaning that exceptions occur in only 0.00001% of the total data population.  Now, imagine explaining these statistics to the project team, but providing only the 100 exception records for review.  Do not underestimate the difficulty that the human mind has with large numbers (i.e. 100 is an easy number to relate to but one billion is practically incomprehensible).  Also, don’t ignore the effect known as “negativity bias” where bad evokes a stronger reaction than good in the human mind – just compare an insult and a compliment, which one do you remember more often?  Focusing on the exceptions can undermine confidence and prevent acceptance of an overwhelmingly successful implementation. 

If you can accept there will be exceptions, admit perfection is impossible, implement data quality improvements in iterations, and acknowledge when the current iteration has reached the Data Quality Goldilocks Zone, then your data quality initiative will not be perfect, but it will be “just right.”

Link to original post

TAGGED:data quality
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

payment methods
How Data Analytics Is Transforming eCommerce Payments
Analytics Big Data Exclusive
cybersecurity essentials
Cybersecurity Essentials For Customer-Facing Platforms
Exclusive Infographic IT Security
ai for making lyric videos
How AI Is Revolutionizing Lyric Video Creation
Artificial Intelligence Exclusive
intersection of data and patient care
How Healthcare Careers Are Expanding at the Intersection of Data and Patient Care
Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

You Build it, You Break It, You Fix It: Why Applications Must Be Responsible for Data Quality

5 Min Read

DQ Certification a Noble Cause

1 Min Read

The General Theory of Data Quality

9 Min Read

Tweet 2001: A Social Media Odyssey

13 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?