Why Data Quality Is of Utmost Importance in Information-Centric Organisations

April 1, 2014
275 Views
ImageHigh quality data will lead to valuable information and insights for your organisation, but obtaining high quality data is easier said than done. Improving your data quality and sustaining a good quality data output should be at the centre of your Big Data Strategy.

ImageHigh quality data will lead to valuable information and insights for your organisation, but obtaining high quality data is easier said than done. Improving your data quality and sustaining a good quality data output should be at the centre of your Big Data Strategy. Last week I was invited to speak at the Asia Pacific Data Quality Conference in Melbourne, Australia and I’d like to share some of insights of this event, as proper data governance could set your organisation apart from your competitor.

Data quality starts with the right Master Data Management processes within your organisation, as the Master Data forms the basis of your Big Data analytics. Master Data of the right quality is data that is complete, accurate & consistent, available, time-stamped and industry standards-based. Improving the data will results in reduced costs, improved efficiency, better insights and enables collaboration across verticals.

Complete data means that all relevant data for a customer is linked and entered in the database. Often obtaining complete data records is a challenge for organisations, as the sales people often forget to ask certain information or customers do not see the benefit of providing all required details. The Australian insurer IAG has found a solution for this problem, as they will be launching a so-called “data-discount” for their customers. This means that IAG will provide their customers with all sorts of discounts (free movie tickets, discounts on their fees, etc.) in return for complete data records.

Accurate and consistent data is all about ensuring that the data entered, is entered correctly, without misspellings, typos and/or random abbreviations. A good example is that the American logistics company US Xpress, when implementing their Big Data Strategy, found 178 different ways of writing “Walmart” in their database. Of course, if you want to link all data, this should be prevented. IAG also found a solution for this problem, as Ram Kumar, the CIO of IAG explained, they have moved towards quality awards (monthly, quarterly and annually) for their branches to ensure that all data is accurate and consistent. The branch that achieves the highest data quality receives the award. This has resulted in their quality levels shooting up from 65% to 95% within just a few months.

The data of course should be available and easily accessible at all times for users and they should not need to search for it manually. In addition, the data should be time-stamped, which refers to that it is clear when the data was created, changed and/or deleted and by whom as well that it should be sufficiently up-to-date for the task at hand. Finally, it should adhere to industry standards so that it can be exchanged among companies and verticals. Especially for data related to the Internet of Things and the Industrial Internet this will be a challenge to get this arranged in the near future.

The Australian Telecom organisation Telstra has created a Data Firewall to ensure the quality of their data. The data quality firewall improves the way their information factory delivers data that is available when required and that the data is complete and error-free. The data quality firewall provides 24×7 automated monitoring of data in Telstra’s Enterprise Data Warehouse and alerts when data quality issues are detected.

The above may seem obvious and easy to ensure, as there are sufficient technologies and techniques available in the market that could help. However, the problem is not so much the technology, but much more a human problem and that makes it a lot more difficult to solve. After all, creating a ‘single source of truth’ to ensure the quality of the data is often not achieved because employees fail to meet the set standards. Often small errors or breaches are not a problem, but when they align, the so-called Swiss Cheese Model of Accident Causation, they can eventually lead to a major data breach. So all sources are important and all employees involved with data at any point in time should understand the importance of high quality data.

The best way to selling the issue of data quality to your employees, and management, is therefore to determine the costs involved with wrong data, data breaches, and the amount of lost revenue it could result in. According to IBM, data-quality related problems can results in millions of dollars in lost revenue, unhappy and lost customers, failure to meet regulatory compliance (and eventually corresponding fines) as well as failure of information-intensive projects.

In the end, if data is not managed correctly and the quality is not ensured, your data can become a risky liability instead of a valuable asset. So, within a data-driven, information-centric organisation everyone should be aware of the importance of high quality data to ensure customer engagement and eventually a positive bottom line of the company.