On Trees, Data Quality, and Big Data
I recently had some palm trees put into my backyard in my Nevada home. It was a downright cool experience that required industrial cranes to life the two-ton trees above my home.
There was no damage done to my home or the driveway and nobody was injured. Everything went well. Well, almost everything. It turns out that the truck carrying these trees was delayed.
Did the drivers have hard time loading these monstrosities? No. Was the enormous truck able to snake its way into my community? Yes. So, what was the problem?
Good old human error. When I bought the trees, my friend Jeff accompanied me. Jeff knows a thing or two about landscaping and I’m anything but a palm tree expert. I paid for the trees and gave the woman at the counter my proper address. I assume that all was good to go.
Fast forward to tree delivery day. After a few hurried calls and general wonderment about where these things were, we identified the culprit. The saleswoman wrote down Jeff’s address on the deliver-to line, not mine. She put my address in the ‘notes’ section. For their part, the delivery guys didn’t read the notes and wound up driving 60 miles out of the way.
I’ve seen many parallels between palm trees and enterprise data in my career. I’ve had users question the accuracy of my reports. I might hear things like “There’s no way that had that many promotions last month! You’re report is wrong!”
While I’m not perfect, I would often tell the skeptical user that we should check the data in the source system. More often than not, my report was accurate but the data pulled into that report was not. Thanks to audit tables and metadata, I could typically pinpoint the time, date, and creator of the errant record.
I would then work backwards. That is, after we knew that a user made this mistake, I would ask the natural next questions:
- What other mistakes did this user make?
- What else do we have to clean up?
- Is there a larger departmental or organizational training issue?
- Couldn’t we write a business rule or audit report to prevent the recurrence of this problem?
Everyone makes mistakes, and I’m certainly no exception. The larger point here is that data matters, especially the accurate kind. One of my favorite expressions is PICNIC–aka, problem in chair, not in computer. We can do simply amazing things with Big Data, but I’ll always insist that Small Data and data quality are just as important.
Method for an Integrated Knowledge Environment (MIKE2.0) is an open source delivery framework for Enterprise Information Management. The MIKE2.0 Methodology has been built to support our belief that information really is one of the most crucial assets of a business. We believe meaningful, cost-effective Business and Technology processes can only be achieved with a successful approach for ...
Other Posts by MIKE20 Governance Association
The moderated business community for business intelligence, predictive analytics, and data professionals.
|How do you innovate effectively and maintain a competive edge?|
Learn how in our exlcusive ebook, "Bad Data Need Not Apply: Designing the Modern Data Warehouse Environment."