It is pretty much agreed now that the “big” in big data is a relative term. If the volumes of accessible, usable, actionable data continue to grow at current rates, then big data will be big for years to come. But it probably won’t. Going back few years, data warehouses captured detail at the sales by unit by time level. Size exploded when it went down to the item level, then really exploded when it went down to the customer level. These order-of-magnitude increases were episodic, not continuous.
It is pretty much agreed now that the “big” in big data is a relative term. If the volumes of accessible, usable, actionable data continue to grow at current rates, then big data will be big for years to come. But it probably won’t. Going back few years, data warehouses captured detail at the sales by unit by time level. Size exploded when it went down to the item level, then really exploded when it went down to the customer level. These order-of-magnitude increases were episodic, not continuous. But capturing clickstream data turned out to be a little too much for data warehousing (at the detail level), and big data was born.
Now, capturing and making sense of what we call big data is an episode in progress: Web data, machine-generated data, sensor data, climate data, all sorts of textual data and a host of other things. What else is there? That may be a naïve question,
Most of the notable big data has been around for a while. Science, defense and intelligence have been capturing and analyzing data we in the commercial sector can’t even fathom, with proprietary methods and specialized machines. But there is a lot of “big data” we have been looking at for quite a while, we just didn’t use all the data, or even conceive how we could, because we did not have the resources to do it. Telemetry from all sorts of things, from process control systems to commercial aircraft in flight, has been examined in real-time and then either discarded or aggregated because the accumulation of it simply overwhelmed our ability to capture it, store it and examine it and integrate it after the fact.
For example, telecom companies have been using Call Detail Records (CDR) for a slew of applications (I worked on an application as early 1990), but not to the extreme level, actually lowest level, of detail. It had to be either aggregated or sampled, or both. Now we have the tools to look at it in excruciating detail. That’s great, isn’t it?
Maybe. The coming “trough of disillusionment,” that follows the initial hype of new technology markets, as Gartner describes it, is that big data is still all about data, not outcomes. That it requires “data scientists” with technical skills in configuring a cluster, writing MapReduce code in Java and creating result sets that are the epitome of silos (or DOOP-marts as I’ve named them) is reminiscent of data warehousing in the 90’s, only on steroids.
What’s needed in big data is some gentrification, the ability to use it without getting into the nuts and bolts. We suggest abstraction.
Abstraction is applied routinely to systems that are to some degree complex and especially when they are subject to frequent change. A 2012 model car contains more MIPS of computer processing than most computers only a decade ago. Driving the car, even under extreme conditions, is a perfect example of abstraction. Stepping on the gas doesn’t really pump gas to the engine, it alerts the engine management system to increase speed by sampling and alerting dozens of circuits, relays and devices to achieve the desired effect subject to many constraints, such as limiting engine speed and watching the fuel air mixture for maximum economy or lowest emissions. If the driver needed to attend to all of these things directly, he would not get out of the driveway.
A 1971 Audi had virtually no electronics at all. A 2012 Audi S8 practically drives (and stops) itself. Today, working with big data is still a lot like driving a 1971 Audi. It will quickly (much faster than 40 years!) resemble riding in a 2012 S8. How quickly? 2-3 years. At that point, will we still call it “big data?”
Big data relies on at least some of the business users understanding the location and naming conventions (in the best cases) and semantics of the data, if not the intricacies of the crafting of queries. This is a huge barrier to progress. Business people need to define their work in their own terms. A business modeling environment is needed for designing and maintaining structures. It is especially important to have business modeling for the inevitable changes in those structures. It is likewise important for leveraging the latent value of those structures through analytical work that is enhanced by understandable models that are relevant and useful to business people.
Neil Raden, Hired Brains Research