Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    media monitoring
    Signals In The Noise: Using Media Monitoring To Manage Negative Publicity
    5 Min Read
    data analytics
    How Data Analytics Can Help You Construct A Financial Weather Map
    4 Min Read
    financial analytics
    Financial Analytics Shows The Hidden Cost Of Not Switching Systems
    4 Min Read
    warehouse accidents
    Data Analytics and the Future of Warehouse Safety
    10 Min Read
    stock investing and data analytics
    How Data Analytics Supports Smarter Stock Trading Strategies
    4 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Referential Treatment – The Open Source Reference Data Trend
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Business Intelligence > Referential Treatment – The Open Source Reference Data Trend
Business Intelligence

Referential Treatment – The Open Source Reference Data Trend

SteveSarsfield
SteveSarsfield
6 Min Read
SHARE

Reference data can be used in a huge number of data quality and data enrichment processes.  The simplest example is a table that contains cities and their associated postal codes – you can use an ETL process to make sure that all your customer records that contain 02026 for a postal code always refer to the standardized “Dedham, MA” for the city and state, not variations like “Deadham Mass”  or “Dedam, Massachusetts”.

Reference data is not limited to customer address, however. If everyone were to use the same reference data for parts, you could easily exchange procurement data between partners.  If only certain values are allowed in any given table, it would support validation.  By having standards for supply chain data, procurement, supply chain, finance and accounting data, processes are more efficient.  Organizations like the ISO and ECCMA are working on that.

Availability of Reference Data
In the past, it was difficult to get your hands on reference data. Long ago, no one wanted to share reference data with you…

More Read

Next query: NoSQL and Business Intelligence
Why Blogging is Not Dying
Why Data Isn’t The Only Factor Guiding Your Management Decisions
Moving BI Into The Cloud Part 1
New Features of CRM that has Revolutionized Microsoft Dynamics 365

Reference data can be used in a huge number of data quality and data enrichment processes.  The simplest example is a table that contains cities and their associated postal codes – you can use an ETL process to make sure that all your customer records that contain 02026 for a postal code always refer to the standardized “Dedham, MA” for the city and state, not variations like “Deadham Mass”  or “Dedam, Massachusetts”.

Reference data is not limited to customer address, however. If everyone were to use the same reference data for parts, you could easily exchange procurement data between partners.  If only certain values are allowed in any given table, it would support validation.  By having standards for supply chain data, procurement, supply chain, finance and accounting data, processes are more efficient.  Organizations like the ISO and ECCMA are working on that.

Availability of Reference Data
In the past, it was difficult to get your hands on reference data. Long ago, no one wanted to share reference data with you – you had to send your customer data to a service provider and get the enriched data back.  Others struggled to develop reference data on their own. Lately I’m seeing more and more high quality reference data available for free on the Internet.   For data jockeys, these are good times.

GeoNames
A good example of this is GeoNames.  The GeoNames geographical database is available for download free of charge under a creative commons attribution license. According to the web site, it “aggregates over 100 different data sets to build a list containing over eight million geographical names and consists of 7 million unique features whereof 2.6 million populated places and 2.8 million alternate names. The data is accessible free of charge through a number of web services and a daily database export. “

GeoNames combines geographical data such as names of places in various languages, elevation, population and others from various sources. All lat/long coordinates are in WGS84 (World Geodetic System 1984). Like Wikipedia, users may manually edit, correct and add new names.

US Census Data
Another rich set of reference data is the US Census “Gazetteer” data. Courtesy of the US government, you can download a database with the following fields:

  • Field 1 – State Fips Code
  • Field 2 – 5-digit Zipcode
  • Field 3 – State Abbreviation
  • Field 4 – Zipcode Name
  • Field 5 – Longitude in Decimal Degrees (West is assumed, no minus sign)
  • Field 6 – Latitude in Decimal Degrees (North is assumed, no plus sign)
  • Field 7 – 2000 Population (100%)
  • Field 8 – Allocation Factor (decimal portion of state within zipcode)

So, our Dedham, MA entry includes this data:

  • “25”,”02026″,”MA”,”DEDHAM”,71.163741,42.243685,23782,0.003953

It’s Really Exciting!
When I talk about reference data at parties, I immediately see eyes glaze over and it’s clear that my fellow party-goers want to escape my enthusiasm for it.  But this availability of reference data is really great news! Together with the open source data integration tools like Talend Open Studio, we’re starting to see what I like to call “open source reference data” becoming available. It all makes the price of improving data quality much lower and our future much brighter.

There’s so much to talk about with regard to reference data and so many good sources.  I plan to make more posts on this topic, but feel free to post your beloved reference data sources here in the comments section.

Link to original post

TAGGED:data integration
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai in video game development
Machine Learning Is Changing iGaming Software Development
Exclusive Machine Learning News
media monitoring
Signals In The Noise: Using Media Monitoring To Manage Negative Publicity
Analytics Exclusive Infographic
data=driven approach
Turning Dead Zones Into Data-Driven Opportunities In Retail Spaces
Big Data Exclusive Infographic
smarter manufacturing
Connecting the Factory Floor: Efficient Integration for Smarter Manufacturing
Infographic News

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

Big Data Is More Valuable with Kapow

6 Min Read

Data-Driven BPM: Making “Big Data” Actionable

4 Min Read

Plumbing the Salesforce Clouds is Your Business

10 Min Read

Customer Data Integration – Separating the Hype from the Reality

4 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?