3 Sweet Big Data Lies

January 13, 2017
643 Views

Big data – big potential, they say. They talk about how big their big data is and how many thousands of data points they have. They talk about quantity as if it mattered most. Big data is suddenly everywhere – everyone seems to be collecting it, analyzing it and processing it along with making money from undoubted successes that big data usage brings. But here are 3 sweet big data lies that majority of people believe in (or choose to believe in). After all, sweet lies are sometimes a bit more pleasant than bitter truth, right?

Big data – big potential, they say. They talk about how big their big data is and how many thousands of data points they have. They talk about quantity as if it mattered most. Big data is suddenly everywhere – everyone seems to be collecting it, analyzing it and processing it along with making money from undoubted successes that big data usage brings. But here are 3 sweet big data lies that majority of people believe in (or choose to believe in). After all, sweet lies are sometimes a bit more pleasant than bitter truth, right?

1.     Big Data Solves All Problems

In the world of big data, the surrounding hype has spawned a brand-new premise: if you happen to use big data in your company, it automatically makes the company more successful. But companies usually brag about the size of their datasets as much as an old fisherman lies about the size of a fish he’s caught (multiplying the real size at least twice so that it sounds more impactful). Companies, just like the poor fisherman, want to feel big, significant and important, which is the reason why some stuff does get exaggerated. And even if the stuff does not get exaggerated it does not mean that the company is doing well simply because of the large datasets it’s collecting.

The presumed advantages of exaggerated information seem understandable – the more you know, the better outcome you can expect.  Unfortunately, once the data does actually get big more problems arise: the more information – the more difficult it becomes to collect and systematize. So instead of bragging about the size of your data sets give a shout out once you manage to collect and systemize your data so that it actually is possible to run an analysis on the data and it does not only disappear in a cold data warehouse.

2. All Data We Have Is Good

Since a number of companies brag on how big their data set is, the main question arises – how much data is necessary or even appropriate to improve your decision quality? Does an extra piece of information (i.e. a data point) add any value, and if not, then why is it in your data set? And in its essence, how do we know that the data we collect is the one we need and how can we make most of the information we currently have?

Usage of big data is only meaningful when it is used to optimize and automate solutions and solve problems. We need to shift the focus from just collecting vast amounts of all possible data to contextualizing the one we have collected, within its own specific area. To make data valuable, it must be sorted, processed and used in models. Long story short – collecting data is awesome, but make sure you know how to effectively make sense of it by effective, mostly automated procedures.

3. We Know What Data We Need

The biggest problem with big data is that understanding massive amount of data is simply difficult – it’s not really comprehensible to humans at scale. And while I’m still a believer in data, big data has been turned into some kind of marketing term which suddenly makes your business sound way cooler if you happen to use it.

Let’s face it: data can be problematic. Even smaller sets of data can be quite a struggle to manage both technically as mentioned above. Worse to that, nobody knows what data you might need unless you experiment with it. If you aim to run many experiments (which is more than encouraged), you also need reliable ways of routinizing experimentation that is not only based on tools, but mostly relies on a capable data science team which controls a meaningful framework to generate the models (datasets used, overfitting, etc.) and develop appropriate target variables which enable new use cases for the data collected.

I don’t want to portray and apocalyptic image that everything related to big data is a lie. As companies start working with big data to tackle new business opportunities and enhance their existing business offerings, the biggest challenge comes in data management, cleaning procedures and finding the right data science team members. If you’ve got those three steps you’re good to go, though constant experimentation and radical customization will be new hallmarks of competition as they are a never-ending process that puts your company forward step by step. Make sure you don’t lag behind.