Big Data, Big Mistakes?

Now, I may be accused of getting up on my soap box in this first post of 2012, but…

Now, I may be accused of getting up on my soap box in this first post of 2012, but… a few recent articles on the topic of big data / predictive analytics have really got me thinking. Well, worrying, to be more precise. My worry is that there seems to be a growing belief in the somehow magical properties of big data and a corresponding deification of those on the leading edge of working with big data and predictive analytics. What’s going on?

The first article I came across was “So, What’s Your Algorithm?” by Dennis Berman in the Wall Street Journal. He wrote on January 4th, “We are ruined by our own biases. When making decisions, we see what we want, ignore probabilities, and minimize risks that uproot our hopes. What’s worse, ‘we are often confident even when we are wrong,’ writes Daniel Kahneman, in his masterful new book on psychology and economics called ‘Thinking, Fast and Slow.’ An objective observer, he writes, ‘is more likely to detect our errors than we are.'”

I’ve read no more than the first couple of chapters of Kahneman’s book (courtesy of Amazon Kindle samples), so I don’t know what he concludes as a solution to the problem posed above–that we are deceived by our own inner brain processes. However, my intuitive reaction to Berman’s solution was visceral: how can he possibly suggest that the objective observer advocated by Kahneman could be provided by analytics over big data sets? In truth, the error Berman makes is blatantly obvious in the title of the article… it always is somebody‘s algorithm.

The point is not that analytics and big data are useless. Far from it. They can most certainly detect far more subtle patterns in far larger and statistically more significant data sets than most or even all human minds can. But, the question of what is a significant pattern and, more importantly, what it might mean remains the preserve of human insight. (I use the term “insight” here to mean a balanced judgment combining both rationality and intuition.) So, the role of such systems as objective observer for the detection and possible elimination of human error is, to me, both incorrect and objectionable. It merely elevates the writer of the algorithm to the status of omniscient god. And not only omniscient, but also often invisible.

Which brings me to the second article that got me thinking… rather negatively, it so happens. “This Is Generation Flux: Meet The Pioneers Of The New (And Chaotic) Frontier Of Business” by Robert Safian was published by Fast Company magazine on January 9th. The breathless tone, the quirky black and white photos and the personal success stories all contribute to a sense (for me, anyway) of awe in which we are asked to perceive these people. The premise that the new frontier of business is chaotic is worthy of deep consideration and, in my opinion, is quite likely to be true. But, the treatment is, as Scott Davis of Lyzasoft opined “more Madison Avenue than Harvard Business Review”. It is quite clear that each of the pioneers here has made significant contributions to the use of big data and analytics in a rapidly changing business world. However, the converging personal views of seven pioneering people—presumably chosen for their common views on the topic—hardly constitutes a well-founded, thought-out theoretical rationale for concluding that big data and predictive analytics are the only, or even a suitable, solution for managing chaos in business.

As big data peaks on the hype curve this year (or has it done so already?), it will be vital that we in the Business Intelligence world step back and balance the unbridled enthusiasm and optimism of the above two articles with a large dollop of cold, hard realism based on our many years experience of trying to garner value from “big data”. (Since its birth, BI has always been on the edge of data bigger than could be comfortably handled by the technology of the time.) So, here are three questions you might consider asking the next big data pioneer who is preaching about their latest discovery: What is the provenance of the data you used—its sources, how it was collected/generated, privacy and usage conditions? Can you explain in layman’s terms the algorithm you used (recall that a key cause of the 2008 financial crash was apparently that none of the executives understood the trading algorithms)? Can you give me two alternative explanations that might also fit the data values observed?

Big data and predictive analytics should be causing us to think about new possibilities and old explanations. They should be challenging us to exercise our own insight. Unfortunately, it appears that they may be tempting some of us to do the exact opposite: trust the computer output or the data science gurus more than we trust ourselves. “Caveat decernor” to coin a phrase in in something akin to pig Latin—let the decision maker beware!