Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    predictive analytics risk management
    How Predictive Analytics Is Redefining Risk Management Across Industries
    7 Min Read
    data analytics and gold trading
    Data Analytics and the New Era of Gold Trading
    9 Min Read
    composable analytics
    How Composable Analytics Unlocks Modular Agility for Data Teams
    9 Min Read
    data mining to find the right poly bag makers
    Using Data Analytics to Choose the Best Poly Mailer Bags
    12 Min Read
    data analytics for pharmacy trends
    How Data Analytics Is Tracking Trends in the Pharmacy Industry
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Virtualization comes of age… again!
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Warehousing > Virtualization comes of age… again!
Data Warehousing

Virtualization comes of age… again!

Barry Devlin
Barry Devlin
6 Min Read
SHARE

Way back in 1991, when IBM announced the Information Warehouse Framework, one aspect of the content came as a shock to most people who were promoting data warehousing then.  (There were not too many of us at the time to be shocked…

Way back in 1991, when IBM announced the Information Warehouse Framework, one aspect of the content came as a shock to most people who were promoting data warehousing then.  (There were not too many of us at the time to be shocked… as far as I know, no one had yet claimed paternity of data warehousing, and the first popular book on the topic was still a year away.)  The shock was that the announcement included the concept of access to heterogeneous data, to be supported through an alliance with Information Builders Inc., using their product EDA/SQL.  The accepted wisdom in data warehousing at the time and for many years since was that heterogeneous data must be cleansed, reconciled and loaded into the warehouse via ETL tooling and accessed from there.  Information Warehouse was not a great success for IBM, and access to heterogeneous data largely faded from awareness among data warehousing professionals.  That made a lot of sense back then.  Heterogeneous data was very heterogeneous, very complex and very susceptible to performance problems when accessed in an unplanned manner.  “Leave it alone!” was the sensible advice.

Ten years later, I was again faced with the concept of heterogeneous data access as IBM began the work that was later announced as IBM DB2 Information Integrator.  This time, the starting point was federated access to data, initially across relational systems, but with a clear direction to include all types of data.  Again, the market wasn’t really ready for the concept, although there was a wider degree of acceptance of the idea and a number of early adopters began to experiment seriously with implementation. Most data warehousing experts still shook their heads in disbelief…

Fast forward another ten years and access to heterogeneous data is back on the agenda big time, this time under the name virtualization and the launch of Composite 6 yesterday ups the ante again.  (It may be of interest to note that Composite was founded in 2002, right in the middle of the last wave of interest.)  While there is lots of fascinating stuff in the release about improving performance, caching and governance, my attention was drawn particularly to the inclusion of “big data” integration support.  And my concern was how Composite could understand and reliably use the variety of data types, elements and so on, which are typically present in Hadoop files.

More Read

Failing to Address Data Quality and Consistency – A Series of Unfortunate Data Warehousing/Business Intelligence Events
The Goldman Sachs SaaS scorecard
Entry Point: Architecture or Crumbling Foundation
“Reality mining … is all about paying attention to patterns in life and using that information…”
The Big Data in Teradata

My contention is that over the years since 1991, heterogeneous data sources have, generally speaking, become better defined, less complex in terms of structure and content, more easily accessed, and less prevalent.  Until the advent of big data, that is.  In data management terms, big data is like a giant step backwards to the Wild West from modern suburbia: schema–why bother? Metadata–who needs it, it will be out of date in a day?  Governance–programmers can handle it!

But when I put the question to Dave Besemer, CTO of Composite, the answer I got proved very enlightening.  Not just about Composite’s approach but also about what is going on, perhaps somewhat by stealth, in the world of big data.  Basically, Dave said that Composite accesses big data only via Hive, which provides the basic structural metadata required for virtualization.  And Hive?  Well, Hive defines itself on its own website as, wait for it: “…a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL…”

So, as Robert Browning wrote “God’s in his heaven, all’s right with the world” if you are a data management fan.  The big data folks do recognize the value of data management (Hive has been around since 2009) despite some of the NoSQL hype that still continues to turn up in the press.  That’s not to say that Hive needs to be put in front of every set of Hadoop files.  There’s a whole world of distributed Hadoop data that is so transient and/or so specialized that the only sensible way to use it is via a programmatic interface.  But, Composite isn’t going after that stuff; they are focusing on the better defined and managed segment of big data.  And that makes perfect sense.

But there is still a question in my mind that the broader IT community needs to answer:  How are we going to manage and handle the other, much larger segment of big data?  Pat Helland’s article “If You Have Too Much Data, then ‘Good Enough’ Is Good Enough” in the ACM Journal provides some food for thought.

Oh, and by the way, there are a few data warehouse eminences grises who still proclaim that virtualization is evil and that all data has to go through the data warehouse…  Perhaps they’re waiting for the fourth wave?

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

street address database
Why Data-Driven Companies Rely on Accurate Street Address Databases
Big Data Exclusive
predictive analytics risk management
How Predictive Analytics Is Redefining Risk Management Across Industries
Analytics Exclusive Predictive Analytics
data analytics and gold trading
Data Analytics and the New Era of Gold Trading
Analytics Big Data Exclusive
student learning AI
Advanced Degrees Still Matter in an AI-Driven Job Market
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Top Five Benefits of a Data Warehouse
AnalyticsBusiness IntelligenceData QualityData Warehousing

Top Five Benefits of a Data Warehouse

3 Min Read

“I propose a new definition of business intelligence … Wikipedia provides a simple layperson…”

2 Min Read

“IBM plans to announce on Tuesday that it will supply the world’s fastest supercomputer to the…”

3 Min Read

Intro to Pervasive Business Intelligence (via…

0 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?