Big Data Confusion Looms in the Second Half of 2013

Sad little elephant.jpg

I look up, and suddenly it’s August. I’ve been heads-down for the past three months, finishing my new book, which will be available in October. The title is designed to be thought provoking: “Business unIntelligence – Insight and Innovation beyond Analytics and Big Data”. More on that next week, starting with why I want to provoke you into thinking, and over the coming weeks, too… promise!

For now, let’s talk Big Data again. It’s a topic that remains one part frustrating and one part energizing. Let’s start with the frustration. Despite the best efforts of a number of thought leaders over the past months, the reality is stubbornly hard to pin down. The technologists continue to push the boundaries, but in often perverse ways. Take the announcement by Hortonworks at the recent Hadoop Summit, for example. As reported by Stephen Swoyer, Apache YARN (Yet Another Resource Negotiator) will make it easier to parallelize non-MapReduce jobs: “It’s the difference, argues Hortonworks founder Arun Murthy, between running applications ‘on’ and running them in Hadoop”. Sounds interesting, I thought, so I headed over to the relevant page and found this gem: “When all of the data in the enterprise is already available in HDFS, it is important to have multiple ways to process that data”. Really? How many of you believe that there is the slightest possibility that all the data in the enterprise will ever be available in HDFS? Of course, Hadoop does need a new and improved resource management approach (I suspect that studying IBM System z resource management would help). But, let’s not pretend that even a copy all enterprise data will ever be in one place. Wasn’t that the original data warehouse thinking? We are in a fully distributed and diversified IT world now. And wasn’t big data a major driver in that realization?

Now to the energizing part. When EMA and I ran our first big data survey last summer, we found that big data projects in the real world exhibit a wide range of starting points. Even when the projects are based on Hadoop (and many are not), the idea that all enterprise data should be in HDFS is simply not on the radar. With this year’s survey just recently opened up for input, you do have the opportunity to prove me wrong! As in last year’s work, our focus is on how businesses are translating the hype and opportunities of big data and the emerging technologies into actual projects. It spans both business and technology drivers, because these two aspects are now intimately related, a concept I call the biz-tech ecosystem. That is a foundation of Business unIntelligence and the topic of my next blog.

Until then, I encourage you to take the big data survey soon – it will close next week – especially those of you based beyond North America. We are very interested to see the global picture.

Picture: Sad Little Elephant by Katherine Devlin