Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: 5 Non-Quality Items to Consider in Data Profiling
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > 5 Non-Quality Items to Consider in Data Profiling
Data Mining

5 Non-Quality Items to Consider in Data Profiling

DataQualityEdge
DataQualityEdge
7 Min Read
SHARE

Data Profiling is all about identifying and quantifying the accuracy of the data in a database. How complete is it? How accurate is it? These are must haves in any and all data profiling activities.

However, for any data profiling project to be complete we must look at the metadata. Not only do we need to know how many widgets were sold by analyzing the data, but we need to know when the load job runs, who the business owner is and a few other items that would make your data profiling project a masterpiece.

1. Table Details: Remember each table will have a purpose for existing, what is that purpose. While you’re checking the data within a table don’t forget the metadata on the table. Table physical and logical names and descriptions are essential elements to track. Performing a data profile on data within a large table is great, but if the table was last accessed 4 years ago. Your efforts may yield little ROI. So you will need to note this last statement and go back to the business and ask them – do they still need this data?

Some table details to consider:
Table Name
Table Description
Table Relationships
Table Usage Details (last user, frequency used)
Touch Points
Table Metadata (attribut…

More Read

KNIME
A Swarm of Nano Quadrotors: The flying robot video you absolutely must watch
Simple Tools for Building a Recommendation Engine
I’m a Data Miner: T-Shirts, Mugs and Mousepads
Dharmendra Modha describes IBM’s research in Whole Brain…


Data Profiling is all about identifying and quantifying the accuracy of the data in a database. How complete is it? How accurate is it? These are must haves in any and all data profiling activities.

However, for any data profiling project to be complete we must look at the metadata. Not only do we need to know how many widgets were sold by analyzing the data, but we need to know when the load job runs, who the business owner is and a few other items that would make your data profiling project a masterpiece.

1. Table Details: Remember each table will have a purpose for existing, what is that purpose. While you’re checking the data within a table don’t forget the metadata on the table. Table physical and logical names and descriptions are essential elements to track. Performing a data profile on data within a large table is great, but if the table was last accessed 4 years ago. Your efforts may yield little ROI. So you will need to note this last statement and go back to the business and ask them – do they still need this data?

Some table details to consider:
Table Name
Table Description
Table Relationships
Table Usage Details (last user, frequency used)
Touch Points
Table Metadata (attribute definitions)

Known Table Issues

2. Load Details: Whether your data is being loaded through Datastage, mainframe jobs, or other data loader tools, you will need to record every job that touches a table, and how the data is distributed within that same table. For large tables that hold data from multiple business units this can be a monumental task.

Some load details to consider:
Job Name/Number
Job Frequency
Last Load
File Used
File Layouts
Known Failures
Known Corrective Actions

3. Report Details: Reports may not be on everyone’s list for data profiling projects, and not everyone uses reports if you perform ad-hoc analysis. However, they are an excellent way to determine the value of your data. You can analyze who is using the data; how often it is being used; the type of user; the type of decisions being made; how much data is being used and more. Data usage through reporting will be able to identify some ROI.

Some report details to consider:
Report Name
Report Purpose
Report Owner
Report–Table Touch Points
Report (last user, frequency used)

4. Owner Details: Some would argue that ownership has nothing to do with a data profile. I would say you are probably right. However, I would turn around and say, if you are a support analyst and you are asked, “Why the data is wrong or missing?”, and you don’t have the documentation or other support material. Knowing who the owner of the data is and having that in your data profile means you have just discovered speed dialing. You now have someone to discuss the issue with and someone that is responsible and accountable for the data. This information is virtually priceless to second-line support.

Some owner details to consider:
Data Owner
Process Owner
Contact Information

5. Lineage Details: Lineage details offer a more unique perspective about the data. In large organizations, who have legacy systems and silo environments this activity will become very cumbersome. In some situations the data may even pass through in individual PC for modifications, (a cold chill just went down my spine), before it reaches you the data analyst. It would be very common to see in such organizations that a single piece of data when first entered, runs through not 1, not 2, not 3, but 4 legacy systems and subsystems before coming to rest in an Enterprise Data Warehouse. Having this information in your pocket allows you to better communicate and understand different support teams, front-line users, business partners and more. Please note I use the term database below, but it can refer to any decision points in the data process flow/workflow.

Some Lineage Details:
Database/data warehouse name
Database Owner
Database Owner Contact Information
Database table names
Database attribute names (in many cases the attribute your looking at will not have been called the same in it’s lineage)
Attribute details (size, type, definitions)
First Entry Points

Once you have completed your accuracy and completeness verifications and gathered all of this intimate metadata that is often overlooked, then you can say your data profiling project is a 5-star effort. Remember, you will need to ensure that your gathered details contain accurate information.
Having all this information will not only help you the data quality analyst, but business analysts, and even support analysts. Your organization will be the better for it in the long-run.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

sales and data analytics
How Data Analytics Improves Lead Management and Sales Results
Analytics Big Data Exclusive
ai in marketing
How AI and Smart Platforms Improve Email Marketing
Artificial Intelligence Exclusive Marketing
AI Document Verification for Legal Firms: Importance & Top Tools
AI Document Verification for Legal Firms: Importance & Top Tools
Artificial Intelligence Exclusive
AI supply chain
AI Tools Are Strengthening Global Supply Chains
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Search Innovation: Why Can’t We All Just Get Along?

5 Min Read

Help Change the World with Data Science

2 Min Read

Tweeting from R

4 Min Read

Eight Levels Of Analytics

5 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?