By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    predictive analytics in dropshipping
    Predictive Analytics Helps New Dropshipping Businesses Thrive
    12 Min Read
    data-driven approach in healthcare
    The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas
    6 Min Read
    analytics for tax compliance
    Analytics Changes the Calculus of Business Tax Compliance
    8 Min Read
    big data analytics in gaming
    The Role of Big Data Analytics in Gaming
    10 Min Read
    analyst,women,looking,at,kpi,data,on,computer,screen
    Promising Benefits of Predictive Analytics in Asset Management
    11 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Reference Domains Part IV: Metadata & Governance
Share
Notification Show More
Latest News
ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence
ai in omnichannel marketing
AI is Driving Huge Changes in Omnichannel Marketing
Artificial Intelligence
ai for small business tax planning
Maximize Tax Deductions as a Business Owner with AI
Artificial Intelligence
ai in marketing with 3D rendering
Marketers Use AI to Take Advantage of 3D Rendering
Artificial Intelligence
How Big Data Is Transforming the Maritime Industry
How Big Data Is Transforming the Maritime Industry
Big Data
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Warehousing > Reference Domains Part IV: Metadata & Governance
Data Warehousing

Reference Domains Part IV: Metadata & Governance

zamaes
Last updated: 2011/09/13 at 5:00 PM
zamaes
6 Min Read
SHARE
This is the fourth and final part in the series on working with reference domains, also called classifications. The first part provided an overview of their nature, the second recommended an approach to data modelling, and the third explored collecting and documenting them.

This is the fourth and final part in the series on working with reference domains, also called classifications. The first part provided an overview of their nature, the second recommended an approach to data modelling, and the third explored collecting and documenting them. Here we will discuss metadata related to classifications and how it can be used to assist the governance of content, with particular reference to data quality.
 
Profiling
 
Classifications will first be encountered through the analysis process. As the reference domain is identified and the master source of the full list of codes and descriptions is found, it is possible to compare this data against profile results to determine the integrity of the data. Imagine that the field under investigation is the marital status of an individual. The master source reveals that the full list of codes and descriptions include: 1=Married, 2=Single, 3=Divorced. The table-level profile shows that the minimum value is “1″, while the maximum value is “4″. With the profile output stored as metadata, and the classifications loaded into reference tables, it is possible to automatically test that the actual values found in the source are within the range expected in the reference tables.
Similarly, a more detailed test could be run at the column-level, with the frequency distribution output compared against the reference values to check that no aberrant values appear.
Aberrant values could be a sign of integrity issues, or may indicate that additional values need to be added to the reference tables.
 
In order to make full use of this comparison of reference domain values and profiling results, it is important to collect the external classifications as part of the analysis process. This will allow the team to catch anomalies early and avoid rework.
 
Data Content Governance
 
For external classifications there may well be decisions to be made around the collection and consolidation of reference domains. A protocol should be developed to address any issues with inconsistent domains of values. Multiple domains will need to be rationalized into a single set of values that will be acceptable to all lines of business. Care needs to be taken to ensure the most authoritative source has been identified, and that a process is in place to handle change notification. This is particularly important in situations where the source is a hard-coded list drawn from documentation.  
 
Naming Standards Governance
 
For internal classifications, the content is not subject to content governance so much as the enforcement of naming standards. This is especially important in the naming of relationships, to ensure the nature of the relationship is being accurately described. The time to do this is as the logical mapping document is passed through the screening process to govern all logical names.  
 
Data Architecture Governance
 
The vast majority of reference domains should pose no challenge to data architecture governance. Most data elements will fit neatly into the simple structures of the Reference Domain and Reference Value tables described in part three of this series. There may be a decision to house long lists of values in separate tables; setting a threshold as assessment criteria. For example, if a classification contained more than 500 values, it would be held in its own reference table. This would be done to help access performance, although it may not be required, and should be tested to determine suitability. If a threshold is used to influence design, the profile results can again be used to programmatically assist the design process.
 
Likewise, there may be a call to create special structures for classifications that have unique attribution or particular structures. For instance, a set of classifications may form a balanced tree hierarchy that could be usefully held in denormalized structures. Again, these exceptions should be rare; and I would suggest they be avoided, with a premium placed on consistency of design.
Model validation should ensure that the length of the source fields is accommodated by the target reference tables. The table profile results can be referenced to make this determination automatically.
 
This completes the series on reference domains. Please feel free to provide your feedback. What challenges have you faced with classifications? How did you resolve them?

TAGGED: data governance, reference domains
zamaes September 13, 2011
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence
ai in omnichannel marketing
AI is Driving Huge Changes in Omnichannel Marketing
Artificial Intelligence
ai for small business tax planning
Maximize Tax Deductions as a Business Owner with AI
Artificial Intelligence
ai in marketing with 3D rendering
Marketers Use AI to Take Advantage of 3D Rendering
Artificial Intelligence

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

data governance in ai
Policy and Governance

Data Governance is Vital to the Future of a World Ruled by Artificial Intelligence

8 Min Read
Data Mining
Big DataData ManagementData Mining

5 Challenges Your Company Has to Overcome to Succeed in Data Mining

8 Min Read

Interview: The Need for Big Data Governance

13 Min Read

Informatica Gets Heiler for PIM and Product Information Management

7 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive
giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?