Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Introduction to Data Lineage
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Modeling > Introduction to Data Lineage
Modeling

Introduction to Data Lineage

zygimantas
zygimantas
6 Min Read
SHARE

Sophisticated modern businesses like banks and insurers are data rich. Data is fundamental to their business effectiveness and efficiency.

However, data is not just relevant to the business processes that create it. Many classes of data are essential outside of their main business purpose. This may be for internal reporting and analysis, for use by other applications or for exchange with third parties. Examples are to produce consolidated reporting from distributed sales applications, to feed into a general ledger and to produce regulatory reports.

Sophisticated modern businesses like banks and insurers are data rich. Data is fundamental to their business effectiveness and efficiency.

However, data is not just relevant to the business processes that create it. Many classes of data are essential outside of their main business purpose. This may be for internal reporting and analysis, for use by other applications or for exchange with third parties. Examples are to produce consolidated reporting from distributed sales applications, to feed into a general ledger and to produce regulatory reports.

More Read

PAW London – Uplift Modelling, Text Analytics and Other Advanced Methods
Keeping Singapore Green with Data and Design
Black Swans Causing a Rethink on Global Supply Chains?
Is this data alive through deep learning and intelligence?
Big Data, Analytics and Criminals

Data is copied from application and data siloes into reporting and data integration solutions like data warehouses and data marts. Increasingly external data is integrated with internal data. In financial services instrument data is purchased and integrated before onward distributions to internal systems for trading and analysis. In retail, credit risk data is consumed and used for customer sales and profiling.

All this data movement requires convoluted networks of data extraction, transformation and loading to achieve the desired business outcomes.  Many millions of individual data items will be processed and moved every day. There are often huge legacy IT estates that support numerous business requirements in what is sometimes referred to as the ‘integration hairball’. The processes and IT systems that join together siloes of disparate data are often incompatible and poorly documented.  All these factors mean that some data will end up being inaccurate or misleading to the business and its processes and decisions will lose effectiveness.

Data lineage is the process of understanding, documenting and visualising this data as it goes from origination to consumption. It is the process of tracking data upstream from its end point to ensure the data is accurate and consistent. It covers looking at the origin to destination path both forward and backwards and at any point along the path.

Data Lineage is used to help govern and control that data comes from a reliable source, is transformed appropriately and loaded correctly to its designated location. Data lineage has great importance in a business environment where key decisions rely on accurate information.  Without appropriate technology and processes in place tracking data can be virtually impossible or at the very least a costly and time consuming endeavour.

The main use cases where data lineage is an essential tool are for analysing data errors, for analysing the impact to downstream consumers of changes data structures or systems and for the reporting of data provenance to regulators. These use case will help to explain:-

Error resolution – a business analyst trying to figure out an unknown metric in a generated BI report. The analyst would report the problem to IT support or help desk and an IT resource would look over the source code or specifications to try to figure out where the information came from and what transformations it had gone through. It can take days solve this problem, time that could have been spent more efficiently with appropriate tooling.

Impact analysis – business data requirements are frequently changing and the IT systems that deliver the data will be in a constant cycle of development, testing and release. Having a capability to analyse and visualise data lineage permits greater control and governance of the change cycle.

Regulatory reporting – the financial crisis brought in a wide range of new regulations with the purpose of identifying trouble early and helping financial institutions become better at managing risk. Regulators started highlighting the importance of financial institutions being able to validate the accuracy of compliance reports. This has heightened the importance of data lineage and regulators are demanding transparency and mandating that data lineage is documented and reported. The enforcement of data lineage is an important milestone in this industry as historically it was more important to produce reports on time rather than to demonstrate if the data used for said reports is accurate and consistent. Modern data tools can be applied in this industry greatly automating the workload that would inherently improve the data lifecycle, decrease human errors and save funds put aside for compliance breaches that could be invested in more lucrative ventures.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Get an early start for on-time data modeling

4 Min Read

How to Balance the Five Analytic Dimensions

8 Min Read
patient engagement
Big DataExclusiveModelingPredictive Analytics

Learn Why Doctors Look To Data To Increase Patient Engagement

9 Min Read
Via Science big data
AnalyticsBig DataInside CompaniesModelingPredictive Analytics

First Look: Via Science

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?