Lately, I have been thinking about the entire big data trend. Fundamentally, it makes sense to me and I believe it is useful for some enterprise class problems, but something about it had been troubling me and I decided to take some time and jot down my thoughts. As I thought more about it, I realized my core issue is associated with some of the over simplified rhetoric that I hear about what big data can do for businesses. A lot of it is propagated by speakers/companies at big name conferences and subsequently echoed by many blogs and articles. Here are the 3 main myths that I regularly hear:
1. More data = More insights
An argument which I have heard a lot is that with enough data, you are more likely to discover patterns and facts and insights. Moreover, with enough data, you can discover patterns and facts using simple counting that you can’t discover in small data using sophisticated statistical methods.
It is true but as a research concept For businesses the key barrier is not the ability to draw insights from large volumes of data, it is asking the right questions for which they need an insight. It is not never wise to generalize the usefulness of large datasets since the ability to provide answers will depend on the question being asked and the relevance of the data to the question.
2. Insights = Actionability = Decisions
It is almost an implicit assumption that insights will be actionable and since they are actionable business decisions will be made based on them.
There is a huge gap between insights and actionability. Analysts always find very interesting insights but a tiny fraction of it will be actionable, especially if one has not started with a very strong business hypothesis to test.
Even more dangerous is the assumption, that because an insight is actionable, an executive will make the decision to implement it. Ask any analyst who has worked in a large company and he /she will tell you that realities of business context and failure of rational choice theory stand in the way of a lot of good actionable insights turning into decisions.
3. Storing all data forever is a good thing
This is the Gmail pitch. Enterprises do not have to decide which data they need to store and what to purge. They can and should store everything because of Myth 1. More data means more insights and competitive advantage. Moreover, storage is cheap so why would you not store all data forever.
Remember the backlash against Gmail which did not have a delete button when it started. The fact is there is a lot of useless data which increases noise to signal ratio. Enterprises struggle with data quality issues and storing everything without any thought to what data is more useful for which kind of questions does more harm than good. Business centric approaches to data quality and data architecture have a significant payoff for downstream analytics and we should give them their due credit when we talk about big data.
1. There is a lot of headroom left for small data insights that enterprises fail to profit from.
2. There are indeed some very interesting use cases for big data which are useful for enterprises (even the non-web related ones)
3. But the hype and the oversimplification of the benefits without thoughtful consideration of issues and barriers will eventually lead to disappointment and disillusion in the short run.