Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Open Data Grey Areas
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Quality > Open Data Grey Areas
Data Quality

Open Data Grey Areas

MIKE20
MIKE20
0 Min Read
Image
SHARE

ImageIn a previous post, I discussed some In a previous post, I discussed some data quality and data governance issues associated with open data. In his recent blog post How far can we trust open data?, Owen Boswarva raised several good points about open data.

Contents
Data QualityThird-Party RightsPersonal Data

“The trustworthiness of open data,” Boswarva explained, “depends on the particulars of the individual dataset and publisher. Some open data is robust, and some is rubbish. That doesn’t mean there’s anything wrong with open data as a concept. The same broad statement can be made about data that is available only on commercial terms. But there is a risk attached to open data that does not usually attach to commercial data.”

Data quality, third-party rights, and personal data were three grey areas Boswarva discussed. Although his post focused on a specific open dataset published by an agency of the government of the United Kingdom (UK), his points are generally applicable to all open data.

Data Quality

As Boswarva remarked, the quality of a lot of open data is high even though there is no motivation to incur the financial cost of verifying the quality of data being given away for free. The “publish early even if imperfect” principle also encourages a laxer data quality standard for open data. However, “the silver lining for quality-assurance of open data,” Boswarva explained is that “open licenses maximize re-use, which means more users and re-users, which increases the likelihood that errors will be detected and reported back to the publisher.”

More Read

Text Analytics for Legacy BI Analysis
3 Big Data Pitfalls and How to Avoid Them
Data Quality Magic
Data by the Book: You Don’t Know What You’ve Got Until It’s Gone
The Tooth Fairy of Data Quality

Third-Party Rights

The issue of third-party rights raised by Boswarva was one that I had never considered. His example was the use of a paid third-party provider to validate and enrich postal address data before it is released as part of an open dataset. Therefore, consumers of the open dataset benefit from postal validation and enrichment without paying for it. While the UK third-party providers in this example acquiesced to open re-use of their derived data because their rights were made clear to re-users (i.e., open data consumers), Boswarva pointed out that re-users should be aware that using open data doesn’t provide any protection from third-party liability and, more importantly, doesn’t create any obligation on open data publishers to make sure re-users are aware of any such potential liability. While, again, this is a UK example, that caution should be considered applicable to all open data in all countries.

Personal Data

As for personal data, Boswarva noted that while open datasets are almost invariably non-personal data, “publishers may not realize that their datasets contain personal data, or that analysis of a public release can expose information about individuals.” The example in his post centered on the postal addresses of property owners, which without the names of the owners included in the dataset, are not technically personal data. However, it is easy to cross-reference this with other open datasets to assemble a lot of personally identifiable information that if it were contained in one dataset would be considered a data protection violation (at least in the UK).

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

CTOvision Big Data Reporting for 2012

9 Min Read

From “The Farm” to FarmVille

4 Min Read

Data, Data and More Data [Infographic]

1 Min Read

The Secrets to Big Data and Information Optimization Revealed in 2013 Research Agenda

11 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?