Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Virtualization comes of age… again!
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Warehousing > Virtualization comes of age… again!
Data Warehousing

Virtualization comes of age… again!

Barry Devlin
Barry Devlin
6 Min Read
SHARE

Way back in 1991, when IBM announced the Information Warehouse Framework, one aspect of the content came as a shock to most people who were promoting data warehousing then.  (There were not too many of us at the time to be shocked…

Way back in 1991, when IBM announced the Information Warehouse Framework, one aspect of the content came as a shock to most people who were promoting data warehousing then.  (There were not too many of us at the time to be shocked… as far as I know, no one had yet claimed paternity of data warehousing, and the first popular book on the topic was still a year away.)  The shock was that the announcement included the concept of access to heterogeneous data, to be supported through an alliance with Information Builders Inc., using their product EDA/SQL.  The accepted wisdom in data warehousing at the time and for many years since was that heterogeneous data must be cleansed, reconciled and loaded into the warehouse via ETL tooling and accessed from there.  Information Warehouse was not a great success for IBM, and access to heterogeneous data largely faded from awareness among data warehousing professionals.  That made a lot of sense back then.  Heterogeneous data was very heterogeneous, very complex and very susceptible to performance problems when accessed in an unplanned manner.  “Leave it alone!” was the sensible advice.

Ten years later, I was again faced with the concept of heterogeneous data access as IBM began the work that was later announced as IBM DB2 Information Integrator.  This time, the starting point was federated access to data, initially across relational systems, but with a clear direction to include all types of data.  Again, the market wasn’t really ready for the concept, although there was a wider degree of acceptance of the idea and a number of early adopters began to experiment seriously with implementation. Most data warehousing experts still shook their heads in disbelief…

Fast forward another ten years and access to heterogeneous data is back on the agenda big time, this time under the name virtualization and the launch of Composite 6 yesterday ups the ante again.  (It may be of interest to note that Composite was founded in 2002, right in the middle of the last wave of interest.)  While there is lots of fascinating stuff in the release about improving performance, caching and governance, my attention was drawn particularly to the inclusion of “big data” integration support.  And my concern was how Composite could understand and reliably use the variety of data types, elements and so on, which are typically present in Hadoop files.

More Read

“The biggest danger to cash-strapped U.S. auto companies is making incremental changes to their…”
Seeing Is Believing
Announcing the Congress API – Open Blog)The initial release…
Sybase: Big Data Crisis is a Big Lie
A Better Way to Model Data

My contention is that over the years since 1991, heterogeneous data sources have, generally speaking, become better defined, less complex in terms of structure and content, more easily accessed, and less prevalent.  Until the advent of big data, that is.  In data management terms, big data is like a giant step backwards to the Wild West from modern suburbia: schema–why bother? Metadata–who needs it, it will be out of date in a day?  Governance–programmers can handle it!

But when I put the question to Dave Besemer, CTO of Composite, the answer I got proved very enlightening.  Not just about Composite’s approach but also about what is going on, perhaps somewhat by stealth, in the world of big data.  Basically, Dave said that Composite accesses big data only via Hive, which provides the basic structural metadata required for virtualization.  And Hive?  Well, Hive defines itself on its own website as, wait for it: “…a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL…”

So, as Robert Browning wrote “God’s in his heaven, all’s right with the world” if you are a data management fan.  The big data folks do recognize the value of data management (Hive has been around since 2009) despite some of the NoSQL hype that still continues to turn up in the press.  That’s not to say that Hive needs to be put in front of every set of Hadoop files.  There’s a whole world of distributed Hadoop data that is so transient and/or so specialized that the only sensible way to use it is via a programmatic interface.  But, Composite isn’t going after that stuff; they are focusing on the better defined and managed segment of big data.  And that makes perfect sense.

But there is still a question in my mind that the broader IT community needs to answer:  How are we going to manage and handle the other, much larger segment of big data?  Pat Helland’s article “If You Have Too Much Data, then ‘Good Enough’ Is Good Enough” in the ACM Journal provides some food for thought.

Oh, and by the way, there are a few data warehouse eminences grises who still proclaim that virtualization is evil and that all data has to go through the data warehouse…  Perhaps they’re waiting for the fourth wave?

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

sales and data analytics
How Data Analytics Improves Lead Management and Sales Results
Analytics Big Data Exclusive
ai in marketing
How AI and Smart Platforms Improve Email Marketing
Artificial Intelligence Exclusive Marketing
AI Document Verification for Legal Firms: Importance & Top Tools
AI Document Verification for Legal Firms: Importance & Top Tools
Artificial Intelligence Exclusive
AI supply chain
AI Tools Are Strengthening Global Supply Chains
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

IBM Cloud Labs The world’s largest network of cloud…

2 Min Read

“Storage Dreams” and the greatest things since sliced bread!

7 Min Read

IT as a Business: or IaaB

4 Min Read

The ABCs of Master Data Management

4 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?