In a Petabyte Age, Is Understanding Passé?

June 18, 2009
171 Views
data imageAnalysts have estimated that the volume of data in enterprises of all sizes is doubling every two to three years. With the deluge of data, some companies are finding it makes more sense to discover and act upon patterns (i.e. customers who buy item X also buy item Y), rather than dig deeper and search for causation. In an age of cloud computing and “big data”—where correlation is often sufficient to gain business results—are we losing our thirst for knowledge and understanding?

Chris Anderson, editor of Wired Magazine and author of “The Long Tail” penned a provocative article in the July 2008 issue titled, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.”

Mr. Anderson makes the claim that in “the Petabyte Age”, it’s more important for companies (and the marketers within them) to identify and act upon correlation first, and worry about context later.

For instance he writes, “Google’s founding philosophy is that we don’t know why this page is better than that one: if the statistics of incoming links say it is, that’s good enough. No semantic or causal analysis is required.”

And dismissing many of the sciences that attempt to bring us understanding of

data imageAnalysts have estimated that the volume of data in enterprises of all sizes is doubling every two to three years. With the deluge of data, some companies are finding it makes more sense to discover and act upon patterns (i.e. customers who buy item X also buy item Y), rather than dig deeper and search for causation. In an age of cloud computing and “big data”—where correlation is often sufficient to gain business results—are we losing our thirst for knowledge and understanding?

Chris Anderson, editor of Wired Magazine and author of “The Long Tail” penned a provocative article in the July 2008 issue titled, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.”

Mr. Anderson makes the claim that in “the Petabyte Age”, it’s more important for companies (and the marketers within them) to identify and act upon correlation first, and worry about context later.

For instance he writes, “Google’s founding philosophy is that we don’t know why this page is better than that one: if the statistics of incoming links say it is, that’s good enough. No semantic or causal analysis is required.”

And dismissing many of the sciences that attempt to bring us understanding of the world around us, Mr. Anderson notes, “Who knows why people do what they do? The point is that they do it and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.”

As companies collect more data about their customers, competitors and macro/micro environments, Mr. Anderson makes the claim that our approach to science (hypothesize, model, test) is becoming obsolete.

Science looks for causation. Scientists hypothesize as to why something works/reacts/behaves like it does and then attempt to build a model to represent reality. The goal is to use the model to test and learn, thereby gaining an understanding of a particular phenomenon.

Modeling is not only confined to the realm of physicists and quants. In the field of marketing, managers often work alongside statisticians, to build models that help predict customer proclivities such as items they might like to buy, or identifying customers who might churn to a competitor. Models can be tested and refined and with enough tweaking, models generally get more accurate over time. The output of those models can be used to piece together more complete customer profiles.

However, in the Petabyte Age, Mr. Anderson claims, “There is now a better way. Petabytes allow us to say correlation is enough. We can stop looking at the models.”

Let’s suppose Mr. Anderson is on to something. Mr. Anderson makes the claim that since modeling is often a poor representation of reality and tends to over-simplify things, we’re much better off with less explanation and more action based on the identification of correlation.

In a sense, he says, in an age of massive data, we’re better off with fewer discoveries of knowledge and understanding.

I’m not sure I agree.

Peter Atkins, author of “Galileo’s Finger”, says it much better than I can: “With the rise of the computer and its ability to handle huge numerical calculations of the greatest intensity, we are seeing a shift from analysis to numerical computation. (This is dangerous because) resorting to numerical solution can distance us from understanding.”

It is true that building models is an imperfect science, and cannot in all instances be 100% accurate—it is after all just a model!

That said, models help us test our assumptions and verify our forecasts. They help us test what we think we know, with the ultimate goal of improving our decision making in real-life situations where marketing budgets and return on investment are on the line. Modeling helps us piece the world together, and look beyond the patterns towards discovering “why” things happen as they do. And models help us transform reams of raw data into intelligence thereby helping us predict outcomes with greater accuracy.

While focusing on correlation can help us make better decisions to a degree, philosophically I am concerned with marketers potentially losing our interest in truly understanding what makes our customers, prospects and partners tick. For me at least, I want to know more than “my customers do this, or they do that”; I want to know “why”!

In all fairness to Mr. Anderson, every coin has two sides. There is a fine line between too much pontification and too little action. Indeed, there are many instances where it doesn’t make sense to dig deeper in understanding—where it doesn’t matter “why”, only that a given solution produces results.