Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
    data analytics for trademark registration
    Optimizing Trademark Registration with Data Analytics
    6 Min Read
    data analytics for finding zip codes
    Unlocking Zip Code Insights with Data Analytics
    6 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Reference Domains Part III: Collecting Classifications
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Data Management > Best Practices > Reference Domains Part III: Collecting Classifications
AnalyticsBest PracticesData Warehousing

Reference Domains Part III: Collecting Classifications

zamaes
zamaes
8 Min Read
SHARE
This is part three in the series about reference domains (a.k.a. classifications). In the first part we looked at the nature of classifications. In part two, we discussed a recommended approach to designing the target structures in the EDW to accommodate them. Here we will look at the work of collecting and documenting reference domains.
 
The compilation of a master list of reference domains and their values straddles the analysis and design phases of development.
This is part three in the series about reference domains (a.k.a. classifications). In the first part we looked at the nature of classifications. In part two, we discussed a recommended approach to designing the target structures in the EDW to accommodate them. Here we will look at the work of collecting and documenting reference domains.
 
The compilation of a master list of reference domains and their values straddles the analysis and design phases of development. For external classifications, those captured from the source system, the work must be done in the analysis phase, while internal classifications, being dependent on the design of the target data structures, are collected during the design phase.
 
As mentioned in part one, the external reference domains may first be encountered as code values in a source table; perhaps a status of a bank account, or a categorization of a customer. If you’re lucky, the word “code”, “type”, or “indicator” will appear in the column name. But it’s equally possible that the name will not make it immediately apparent that this is a classification. The analyst will need to determine it from the documentation or from the values contained within the data. Cryptic or abbreviated codes can take the form of numbers or character-based strings. It is important for the analyst to read the clues in the table layout, the documentation and the data to identify potential classifications and then follow up with the data steward or external provider of the source data to find the descriptions that explain the codes.
 
The code descriptions may accompany the codes in the same source table. However, even if this is the case, selecting a distinct set of the codes and descriptions from the table may not return a full set of potential values, only those that are currently being used. For example, pulling a distinct set of status types from a bank account table may return the values “open” and “closed”, but miss the fact that sometimes accounts are briefly “pending”. For this reason, it’s very important to find either a master table in the source that contains the full set of possible codes and values, or the documentation where the full set is listed. In either case, it’s essential to examine the list of occurring values and compare it against the master. The master may be out of sync with the newest list of allowable values, or there may be aberrant values appearing in the data which will need to be dealt with.
 
In some cases, no descriptions will exist. Business users may know the meaning of the codes, without requiring the definition. In these cases, it is best practice to create the description and include it in the reference value table. Codes should have explicit descriptions wherever possible, to avoid miscommunication and potential errors. It is a simple matter to misinterpret a status code of “A” as “Active”, when it actually means “Archived”.
  
Internal reference domains are quite different, in that they are entirely in the control of the EDW design team. The codes and descriptions are all generated and maintained internally. They take the same form of the external classifications, being codes and descriptions, and need to be compiled together with them in a master document and stored in the same reference tables in the EDW.
For compiling the master list, I recommend a pair of spreadsheets, one with a list of the reference domains (e.g., Account Type, Marital Status etc.) and the other a list of reference values for each domain. Here are some points to consider as you create and maintain this list:
  1. Manually set the surrogate keys
    Rather than simply pass the setting of surrogate keys over to the ETL process, this master list is an opportunity to control the keys being applied. In most cases, each domain will contain a relatively short list of values (<50) and this will be a simple task. The control and predictability it supplies can be useful for development, testing and usage of the database.
  2. Reserve ranges of values for the surrogate keys
    It’s a good idea to leave room between the domains of values to allow the reference domain to grow or be corrected over time. These ranges will not be uniform; one domain will reserve twenty sequential numbers while another will require a thousand.
  3. Create dummy values for “Unknowns”
    Many times, incoming records will not find a match to the allowable domain of values. Rather than leave these values null, a surrogate key corresponding to an “unknown” reference value within the domain can be applied.
  4. Create dummy values at finer grain
    In some cases it may be prudent to distinguish between different types of “unknown” values. For example, records for which a classification is not required could be marked “Not Applicable”, while an incoming value that is out of the allowable range could be assigned “Out of Range”. When applying this finer grain consideration should be given to what is practical as well as what will add value.
  5. Create load-ready spreadsheet
    Some external classifications can be loaded from tables, while others, along with all the internal classifications will need to be manually entered. In both cases, a spreadsheet can be used to control the process. I have seen an implementation where a single spreadsheet contained both manual entries and references to tables, and was used as input to the ETL load process.

In the final part of this series we will look at the collection of metadata related to reference domains and how it can be used to assist the governance of data content.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ESG reporting software
Data Shows How ESG Reporting Software Helps Companies Achieve Sustainability Goals
Big Data Infographic
ai in marketing
AI Helps Businesses Develop Better Marketing Strategies
Artificial Intelligence Exclusive
agenic ai
How Businesses Are Using AI to Make Smarter, Faster Decisions
Artificial Intelligence Exclusive
accountant using ai
AI Improves Integrity in Corporate Accounting
Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

DIALOG Improving Customer Experience in Health Insurance

6 Min Read

The Commoditization of Analytics

7 Min Read

Big Data and Analytics – Suggestions to Approach

4 Min Read

New Trends in BI, Analytics and Social Media

9 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?