Big Brother... or do I mean Big Data?

“Social networks already know who you know”, “recommendation engines get much smarter”, “early detection mitigates catastrophes”.

Three of ten ways big data is creating the science fiction future. These types of headlines appeal to the geek optimists in many of us. We think that mitigating a catastrophe is certainly a good thing. That smarter recommendations to whom we should connect and what we might be interested in buying could probably save us time, that most precious of commodities.

Most of us have grown up with a belief system that science and, by extension, technology and computers, are a sine qua non in today’s world. In truth, the world we live in today could not exist without them.

But, at what cost?

Three further headlines from the same blog: “surveillance gets really Orwellian”, “doctors can make sense of your genome–and so can insurers”, “dating sites that can predict when you’re lying”. Perhaps these items give pause for thought. Security cameras lurk in every corridor and public place. And, as of last August, the NYPD has been monitoring Facebook and Twitter. Even in our bedrooms, smart phones can be turned on remotely to monitor our most intimate indiscretions. It’s open season on our actions and communications. Our genomes are fast becoming public property, ostensibly for our better health management; but, clearly, for better risk management–read profit–for insurance companies. Even our thinking is being analyzed.

We’re fast reaching 1984 some 30 years later than George Orwell imagined. At least in our ability to monitor the actions, communications, genetic makeup and thoughts of an ever-increasing swathe of humanity. As BI experts and data scientists, we celebrate our ability to gather and analyze ever more data with ever more sophistication and effort deeper granularity. For marketeers, Utopia is a segment of one whose buying behavior is predictable with certainty. As traders on the commodities or currency markets, our algorithms gamble on the Brownian motion of microscopic movements in prices. For insurers, statistical averaging of risk across populations gives way to cherry picking the low-risk individuals for discounted premiums.

Am I overly pessimistic or even paranoid in imagining that big data brings risks at least as large as the benefits it promises? Are the petabytes and exabytes of information we’re gathering, storing and analyzing open to misuse? We celebrate the role of social networking in pro-democracy movements around the world imagining that tweets and texts that are unassailable weapons for freedom, forgetting that the networks that carry them are run by big businesses whose bottom line is profit. We reveal the secrets of our lives in dribs and drabs, in recordable phone conversations and even through the GPS tracking of our smart phones, oblivious that the technology exists to meet all the clues together, Sherlock Holmes-like, given sufficient time and money.

In my last post, I challenged us to take a step back and apply human insight to the results of big data analysis rather than take the results from statistical analyses at face value, to question the sources and play with other possible explanations before jumping to conclusions. Now, knowing how fallible your own interpretation of big data may be, please give some consideration to the possibility that others, particularly those in positions of power, such as governments and businesses, can accidentally or deliberately misinterpret or misuse the big data resource.

But what can we do as an industry? As individual analysts, consultants, data administrators and more? At the very least, we can revisit the privacy and security controls we build into our systems. Take a look at “Why you can’t really anonymize your data” by Pete Warden and begin pressing the industry and academia to search for new solutions. Look again at your business processes and evaluate if and how the use of big data subverts the intentions or ethics of how you work. And, finally, reread George Orwell’s “1984”.