The Big Data Myth

November 16, 2012

English: The image shows datasets that are pub...

English: The image shows datasets that are pub...

Focus group moderators have a tough job. After each focus group with 6 to 10 people, they have reams of notes to work through, to analyze, to review for meaningful insights. It’s not an easy job because people don’t always say what they mean, they aren’t sure how to express themselves, and they talk over each other making for one massively messy dataset.

But, moderators have tricks of the trade, cool software tools that help them deal with these datasets. But imagine how overwhelmed a moderator would feel if they suddenly had to work with the same kind of dataset based on 300 participants. Overwhelmed, that is, until they found the right tools.

Now imagine you’re a survey specialist who’s used to handing datasets of hundreds, maybe thousands of completes. It’s a lot of data but you’re an experienced survey researcher. You’ve got tools at hand, like Excel, SPSS, or SAS, that make your job easier. You’ve learned how to handle these big datasets. But what if you suddenly had to work with a dataset with millions of records. Nothing could be more overwhelming. Until you had the right tool at hand.

Finally, imagine you’re a data miner who’s used to handling transactional data, point of sales datasets with thousands of variables and millions or billions of records. It’s a massive dataset but once again, you’ve got the experience and data mining tools like SQL to deal effectively with these massive datasets. But what if you suddenly had to deal with datasets a hundred times larger. Well, it would only be scary until you found the right right tools. Which people have been using for a long time.

What’s my point. When you normally deal with datasets with a hundred records, any dataset with a thousand records is overwhelming and paralyzing. And when you normally deal with datasets with a thousand records and now you’ve been asked to work with a dataset with a million records, that volume of data is overwhelming. It’s not the size of the data that makes it big data, it’s your experience with that size of data that makes it big data.

So really, big data is a myth. There are simply datasets that are larger than what you are used to working with and that you don’t yet have sufficient experience or tools to work with. It doesn’t deserve a new name. It deserves time and patience to gain a new sense of comfort and learn the  tools that other people have already been using for a lon time. Nothing more.

Big data? No such thing.

Tagged: Big data