Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics for pharmacy trends
    How Data Analytics Is Tracking Trends in the Pharmacy Industry
    5 Min Read
    car expense data analytics
    Data Analytics for Smarter Vehicle Expense Management
    10 Min Read
    image fx (60)
    Data Analytics Driving the Modern E-commerce Warehouse
    13 Min Read
    big data analytics in transporation
    Turning Data Into Decisions: How Analytics Improves Transportation Strategy
    3 Min Read
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Introduction to Data Lineage
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Modeling > Introduction to Data Lineage
Modeling

Introduction to Data Lineage

zygimantas
zygimantas
6 Min Read
SHARE

Sophisticated modern businesses like banks and insurers are data rich. Data is fundamental to their business effectiveness and efficiency.

However, data is not just relevant to the business processes that create it. Many classes of data are essential outside of their main business purpose. This may be for internal reporting and analysis, for use by other applications or for exchange with third parties. Examples are to produce consolidated reporting from distributed sales applications, to feed into a general ledger and to produce regulatory reports.

Sophisticated modern businesses like banks and insurers are data rich. Data is fundamental to their business effectiveness and efficiency.

However, data is not just relevant to the business processes that create it. Many classes of data are essential outside of their main business purpose. This may be for internal reporting and analysis, for use by other applications or for exchange with third parties. Examples are to produce consolidated reporting from distributed sales applications, to feed into a general ledger and to produce regulatory reports.

More Read

CTOlabs White Paper on Model-Enabled Analysis: Factors for Evaluation
Is the Purpose of Analytics Just to Turn a Buck?
A Better Way to Model Data
‘Trustworthy Cyberspace’: Federal R&D Priorities
Guest Blog: Data 2.0 Conference Report

Data is copied from application and data siloes into reporting and data integration solutions like data warehouses and data marts. Increasingly external data is integrated with internal data. In financial services instrument data is purchased and integrated before onward distributions to internal systems for trading and analysis. In retail, credit risk data is consumed and used for customer sales and profiling.

All this data movement requires convoluted networks of data extraction, transformation and loading to achieve the desired business outcomes.  Many millions of individual data items will be processed and moved every day. There are often huge legacy IT estates that support numerous business requirements in what is sometimes referred to as the ‘integration hairball’. The processes and IT systems that join together siloes of disparate data are often incompatible and poorly documented.  All these factors mean that some data will end up being inaccurate or misleading to the business and its processes and decisions will lose effectiveness.

Data lineage is the process of understanding, documenting and visualising this data as it goes from origination to consumption. It is the process of tracking data upstream from its end point to ensure the data is accurate and consistent. It covers looking at the origin to destination path both forward and backwards and at any point along the path.

Data Lineage is used to help govern and control that data comes from a reliable source, is transformed appropriately and loaded correctly to its designated location. Data lineage has great importance in a business environment where key decisions rely on accurate information.  Without appropriate technology and processes in place tracking data can be virtually impossible or at the very least a costly and time consuming endeavour.

The main use cases where data lineage is an essential tool are for analysing data errors, for analysing the impact to downstream consumers of changes data structures or systems and for the reporting of data provenance to regulators. These use case will help to explain:-

Error resolution – a business analyst trying to figure out an unknown metric in a generated BI report. The analyst would report the problem to IT support or help desk and an IT resource would look over the source code or specifications to try to figure out where the information came from and what transformations it had gone through. It can take days solve this problem, time that could have been spent more efficiently with appropriate tooling.

Impact analysis – business data requirements are frequently changing and the IT systems that deliver the data will be in a constant cycle of development, testing and release. Having a capability to analyse and visualise data lineage permits greater control and governance of the change cycle.

Regulatory reporting – the financial crisis brought in a wide range of new regulations with the purpose of identifying trouble early and helping financial institutions become better at managing risk. Regulators started highlighting the importance of financial institutions being able to validate the accuracy of compliance reports. This has heightened the importance of data lineage and regulators are demanding transparency and mandating that data lineage is documented and reported. The enforcement of data lineage is an important milestone in this industry as historically it was more important to produce reports on time rather than to demonstrate if the data used for said reports is accurate and consistent. Modern data tools can be applied in this industry greatly automating the workload that would inherently improve the data lifecycle, decrease human errors and save funds put aside for compliance breaches that could be invested in more lucrative ventures.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

cybersecurity essentials
Cybersecurity Essentials For Customer-Facing Platforms
Exclusive Infographic IT Security
ai for making lyric videos
How AI Is Revolutionizing Lyric Video Creation
Artificial Intelligence Exclusive
intersection of data and patient care
How Healthcare Careers Are Expanding at the Intersection of Data and Patient Care
Big Data Exclusive
dedicated servers for ai businesses
5 Reasons AI-Driven Business Need Dedicated Servers
Artificial Intelligence Exclusive News

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Image
AnalyticsBig DataBusiness IntelligenceData MiningExclusiveModelingPredictive AnalyticsSocial DataText Analytics

Harvard Gets Access to Twitter Data Stream to Predict Foodborne Illness Outbreaks

4 Min Read

3 Ways Big Data Is Changing Financial Institutions Forever

8 Min Read

Sense and Respond and the New Way of Selling

5 Min Read

Open Data App for the Paris Métro

3 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?