Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Virtualization comes of age… again!
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Warehousing > Virtualization comes of age… again!
Data Warehousing

Virtualization comes of age… again!

Barry Devlin
Barry Devlin
6 Min Read
SHARE

Way back in 1991, when IBM announced the Information Warehouse Framework, one aspect of the content came as a shock to most people who were promoting data warehousing then.  (There were not too many of us at the time to be shocked…

Way back in 1991, when IBM announced the Information Warehouse Framework, one aspect of the content came as a shock to most people who were promoting data warehousing then.  (There were not too many of us at the time to be shocked… as far as I know, no one had yet claimed paternity of data warehousing, and the first popular book on the topic was still a year away.)  The shock was that the announcement included the concept of access to heterogeneous data, to be supported through an alliance with Information Builders Inc., using their product EDA/SQL.  The accepted wisdom in data warehousing at the time and for many years since was that heterogeneous data must be cleansed, reconciled and loaded into the warehouse via ETL tooling and accessed from there.  Information Warehouse was not a great success for IBM, and access to heterogeneous data largely faded from awareness among data warehousing professionals.  That made a lot of sense back then.  Heterogeneous data was very heterogeneous, very complex and very susceptible to performance problems when accessed in an unplanned manner.  “Leave it alone!” was the sensible advice.

Ten years later, I was again faced with the concept of heterogeneous data access as IBM began the work that was later announced as IBM DB2 Information Integrator.  This time, the starting point was federated access to data, initially across relational systems, but with a clear direction to include all types of data.  Again, the market wasn’t really ready for the concept, although there was a wider degree of acceptance of the idea and a number of early adopters began to experiment seriously with implementation. Most data warehousing experts still shook their heads in disbelief…

Fast forward another ten years and access to heterogeneous data is back on the agenda big time, this time under the name virtualization and the launch of Composite 6 yesterday ups the ante again.  (It may be of interest to note that Composite was founded in 2002, right in the middle of the last wave of interest.)  While there is lots of fascinating stuff in the release about improving performance, caching and governance, my attention was drawn particularly to the inclusion of “big data” integration support.  And my concern was how Composite could understand and reliably use the variety of data types, elements and so on, which are typically present in Hadoop files.

More Read

Google Uses Web Searches to Track Flu’s Spread
Staring at the Lights: Your Data Warehouse Isn’t a Commodity
Show and Tell (via IBMSocialMedia)
Cloud Computing Lingo
Intro to Pervasive Business Intelligence (via…

My contention is that over the years since 1991, heterogeneous data sources have, generally speaking, become better defined, less complex in terms of structure and content, more easily accessed, and less prevalent.  Until the advent of big data, that is.  In data management terms, big data is like a giant step backwards to the Wild West from modern suburbia: schema–why bother? Metadata–who needs it, it will be out of date in a day?  Governance–programmers can handle it!

But when I put the question to Dave Besemer, CTO of Composite, the answer I got proved very enlightening.  Not just about Composite’s approach but also about what is going on, perhaps somewhat by stealth, in the world of big data.  Basically, Dave said that Composite accesses big data only via Hive, which provides the basic structural metadata required for virtualization.  And Hive?  Well, Hive defines itself on its own website as, wait for it: “…a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL…”

So, as Robert Browning wrote “God’s in his heaven, all’s right with the world” if you are a data management fan.  The big data folks do recognize the value of data management (Hive has been around since 2009) despite some of the NoSQL hype that still continues to turn up in the press.  That’s not to say that Hive needs to be put in front of every set of Hadoop files.  There’s a whole world of distributed Hadoop data that is so transient and/or so specialized that the only sensible way to use it is via a programmatic interface.  But, Composite isn’t going after that stuff; they are focusing on the better defined and managed segment of big data.  And that makes perfect sense.

But there is still a question in my mind that the broader IT community needs to answer:  How are we going to manage and handle the other, much larger segment of big data?  Pat Helland’s article “If You Have Too Much Data, then ‘Good Enough’ Is Good Enough” in the ACM Journal provides some food for thought.

Oh, and by the way, there are a few data warehouse eminences grises who still proclaim that virtualization is evil and that all data has to go through the data warehouse…  Perhaps they’re waiting for the fourth wave?

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

How geeks are opening up government on the Web (via iGov – The…

0 Min Read

Follow your SQL Server on Twitter!

4 Min Read

Master Data Management: Does an Effective Solution Exist?

3 Min Read
Image
Big DataCloud ComputingData Warehousing

AWS CEO predicts several winners will emerge from the cloud wars

2 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?