Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Data Integration Roadmap to Support Big Data and Analytics
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Data Integration Roadmap to Support Big Data and Analytics
AnalyticsBig DataData ManagementUnstructured Data

Data Integration Roadmap to Support Big Data and Analytics

Raju Bodapati
Raju Bodapati
9 Min Read
SHARE

Traditional extract, transform and load (ETL) has existed since the times when data warehousing evolved to help move data from legacy mainframe applications.  Therefore, data movement from files to relational or dimensional databases for the consumption by reporting engines has been the focus of ETL. Even in the data world today where most focus has been on data visualization or analytics or business intelligence, data professionals recognize the importance of effective ETL engines as the backbone.

Traditional extract, transform and load (ETL) has existed since the times when data warehousing evolved to help move data from legacy mainframe applications.  Therefore, data movement from files to relational or dimensional databases for the consumption by reporting engines has been the focus of ETL. Even in the data world today where most focus has been on data visualization or analytics or business intelligence, data professionals recognize the importance of effective ETL engines as the backbone. However, with changes such as widespread data access points, diverse data sources and unstructured data, expectations on data connections interfaces have moved towards data integration rather than traditional data movement.    

It is inspiring to read the new TDWI publication by David Loshin, “Satisfying New Requirements for Data Integration“, that briefly highlights changing demands on data integration as a checklist report.  There were seven demands listed in this report; a) increase performance and efficiency, b) integrate the cloud, c) protect information in the integration layer, d) embed master data services, e) process big data and enterprise data, f) satisfy real-time demands and g) develop data quality and data governance policies and practices.

While Loshin identified very well the changing demands on legacy ETL platforms in this publication, it is still a presentation of the future wishful state rather than the path organizations can take to build data integration framework that can sustain the evolving needs. The following is the five step roadmap with specific measures organizations can take as they move towards that future state.

Step 1: get the foundation strong

Establishing a strong data quality and governance organization is perhaps the first foundation needed for data organizations aspiring to transition from mere data moving / storing entity to an information enablement engine. The data integration platform should enforce the policies and practices the organization establishes. ETL infrastructure gets the first look at the nature and volume of the data quality problems of source systems as they are integrated with rest of the organization. The traditional approach in ETL has been finding the workarounds to push the data through by making some tradeoffs. However, these chokepoints, AKA fault tolerance gates, should be re-examined to feed the data quality and integrity problems they reveal into the data governance organization. This does not mean that the organization cannot move to the next step unless they resolve all the data quality issues, but asks to establish visibility and have proper governance to process the data integrity issues of the organization.

Step 2: get serious about information security 

Traditionally, ETL engines land sensitive data and after use do not always discard it from the logs and temporary staging areas. Access, authorization and authentication are compromised when multiple people have ability to use service accounts. Also, when production data is refreshed into test or development environments, scrubbing the data to de-sensitize it is often ignored. Information security especially within the ETL world needs very thorough audits and controls to ensure security policies are enforced. Without this, enabling a wide spread data integration infrastructure can multiply these vulnerabilities and conceivably could be fatal to the organization.

Step 3: smarter master data and graceful validation services

Most Master data management (MDM) implementations continue to remain static and user managed. However, when used well, ETL infrastructure can implement an active and evolving master data management system.  Therefore, one of the first steps organizations are leveraging is to integrate the MDM tools and methods with the ETL engines. Also, ETL engines are increasingly integrating with geospatial validation software or data mapping / translation engines for enforcing data integrity. This is enabling the interfaces to be a bit more graceful and not become chokes when dealing with bad data. There are always strong arguments on what ETL should or should not do to data. However, ETL’s tradition role of moving the data without touching it is getting replaced with integrating data into the organizational information web. These steps can lay down the path for ETL engines as they form the organizational data integration architecture.

Step 4: upgrade the data integration infrastructure with the future in mind

When budgeting ETL infrastructure, most organizations use feedback mechanisms (what went wrong in the past) rather than the feed-forward mechanisms (what needs to go right in the future.) As a result, businesses often find themselves trying to find shortcuts to meet their changing demands with unstructured infrastructure. Traditional ETL environments always lag behind in order to catch up with the damage to data and process integrity caused by such short sighted temporary investments. Therefore, a major part of transforming an ETL organization to a data integration organization involves strategic investment decisions on the fundamental infrastructure needs of the future establishment. For example, when integration with cloud or real-time active data warehousing is on the horizon, the infrastructure investment decisions have to be taken now rather than waiting until the last hour. This calls for program management thinking and not infrastructure support mindset while budgeting.  

Step 5:  enable expanded data integration

Organizations that achieved progress in the previous steps can then think of how integration with cloud and big data analytics or mobile / self-service business intelligence needs be met by their data integration infrastructure. As Loshin explained in his article, mounds of structured data, unstructured data, big data, and advancements in cloud technology coupled with end user driven needs such as mobile BI, self-service BI, real-time reporting, advanced visualization techniques, are rapidly expanding the need for data integration competence well beyond what the traditional data movement ETL engines have to offer. At this stage, the data integration architecture has the necessary security framework, graceful validation to support the unexpected behaviors in data feeds, ability to integrate with and build organizational master data and the required strategic programs in place to support an organizational enablement demanded.

Summary

Traditional ETL infrastructure and processes need a clear roadmap, to consider expected future demands rather than reacting to issues / challenges faced in the past. Building the data integration infrastructure that can support future business needs should be managed as a program with step-by-step evolution. Data integration infrastructure should support new data sources from cloud, unstructured data or big data. Also, data integration infrastructure should be able to support real-time needs for data, mobile business intelligence, information access and performance demands, information security needs, and analytics. The steps described in this article can provide vision into a roadmap as the traditional ETL infrastructures transition to become the data integration services providers.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

sales and data analytics
How Data Analytics Improves Lead Management and Sales Results
Analytics Big Data Exclusive
ai in marketing
How AI and Smart Platforms Improve Email Marketing
Artificial Intelligence Exclusive Marketing
AI Document Verification for Legal Firms: Importance & Top Tools
AI Document Verification for Legal Firms: Importance & Top Tools
Artificial Intelligence Exclusive
AI supply chain
AI Tools Are Strengthening Global Supply Chains
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

digital transformation
Big DataExclusiveITNewsSecurity

Digital Transformation Has Spurred an Unexpected Renaissance in the Age of Big Data

5 Min Read

Who Hates Google+ the Most: 16 Views from 16 Networks

2 Min Read
Image
Best PracticesBig DataTransparency

The Use and Abuse of Big Data and Hadoop

7 Min Read
Data Catalog
AnalyticsData ManagementData MiningData QualityData Warehousing

Moving to Self-Serve Analytics? You Need a Data Catalog

5 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence
ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?