Schrödinger’s Data Quality

May 21, 2009
51 Views

In 1935, Austrian physicist Erwin Schrödinger described a now famous thought experiment where:

“A cat, a flask containing poison, a tiny bit of radioactive substance and a Geiger counter are placed into a sealed box for one hour. If the Geiger counter doesn’t detect radiation, then nothing happens and the cat lives. However, if radiation is detected, then the flask is shattered, releasing the poison which kills the cat. According to the Copenhagen interpretation of quantum mechanics, until the box is opened, the cat is simultaneously alive and dead. Yet, once you open the box, the cat will either be alive or dead, not a mixture of alive and dead.” 

This was only a thought experiment. Therefore, no actual cat was harmed. 

This paradox of quantum physics, known as Schrödinger’s Cat, poses the question:

“When does a quantum system stop existing as a mixture of states and become one or the other?”

Unfortunately, data quality projects are not thought experiments. They are complex, time consuming and expensive enterprise initiatives. Typically, a data quality tool is purchased, expert consultants are hired to supplement staffing, production data is copied to a developme

In 1935, Austrian physicist Erwin Schrödinger described a now famous thought experiment where:

“A cat, a flask containing poison, a tiny bit of radioactive substance and a Geiger counter are placed into a sealed box for one hour. If the Geiger counter doesn’t detect radiation, then nothing happens and the cat lives. However, if radiation is detected, then the flask is shattered, releasing the poison which kills the cat. According to the Copenhagen interpretation of quantum mechanics, until the box is opened, the cat is simultaneously alive and dead. Yet, once you open the box, the cat will either be alive or dead, not a mixture of alive and dead.” 

This was only a thought experiment. Therefore, no actual cat was harmed. 

This paradox of quantum physics, known as Schrödinger’s Cat, poses the question:

“When does a quantum system stop existing as a mixture of states and become one or the other?”

Unfortunately, data quality projects are not thought experiments. They are complex, time consuming and expensive enterprise initiatives. Typically, a data quality tool is purchased, expert consultants are hired to supplement staffing, production data is copied to a development server and the project begins. Until it is completed and the new system goes live, the project is a potential success or failure. Yet, once the new system starts being used, the project will become either a success or failure.

This paradox, which I refer to as Schrödinger’s Data Quality, poses the question:

When does a data quality project stop existing as potential success or failure and become one or the other?”

Data quality projects should begin with the parallel and complementary efforts of drafting the business requirements while also performing a data quality assessment, which can help you:

  • Verify data matches the metadata that describes it
  • Identify potential missing, invalid and default values
  • Prepare meaningful questions for subject matter experts
  • Understand how data is being used
  • Prioritize critical data errors
  • Evaluate potential ROI of data quality improvements
  • Define data quality standards
  • Reveal undocumented business rules
  • Review and refine the business requirements
  • Provide realistic estimates for development, testing and implementation

Therefore, the data quality assessment assists with aligning perception with reality and gets the project off to a good start by providing a clear direction and a working definition of success.

However, a common mistake is to view the data quality assessment as a one-time event that ends when development begins. 

Projects should perform iterative data quality assessments throughout the entire development lifecycle, which can help you:

  • Gain a data-centric view of the project’s overall progress
  • Build data quality monitoring functionality into the new system
  • Promote data-driven development
  • Enable more effective unit testing
  • Perform impact analysis on requested enhancements (i.e. scope creep)
  • Record regression cases for testing modifications
  • Identify data exceptions that require suspension for manual review and correction
  • Facilitate early feedback from the user community
  • Correct problems that could undermine user acceptance
  • Increase user confidence that the new system will meet their needs

If you wait until the end of the project to learn if you have succeeded or failed, then you treat data quality like a game of chance.

And to paraphrase Albert Einstein:

“Do not play dice with data quality.”

Link to original post

You may be interested

Big Data Revolution in Agriculture Industry: Opportunities and Challenges
Analytics
69 shares1,947 views
Analytics
69 shares1,947 views

Big Data Revolution in Agriculture Industry: Opportunities and Challenges

Kayla Matthews - July 24, 2017

Big data is all about efficiency. There are many types of data available, and many ways to use that information.…

How SAP Hana is Driving Big Data Startups
Big Data
298 shares3,200 views
Big Data
298 shares3,200 views

How SAP Hana is Driving Big Data Startups

Ryan Kh - July 20, 2017

The first version of SAP Hana was released in 2010, before Hadoop and other big data extraction tools were introduced.…

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion
Data Management
156 views
Data Management
156 views

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion

Manish Bhickta - July 20, 2017

Physical Data destruction techniques are efficient enough to destroy data, but they can never be considered eco-friendly. On the other…