Big Data Is Really Dead

IDG Enterprise’s 2015 Big Data and Analytics survey shows that the number of organizations with deployed/implemented data-driven projects has increased by 125% over the past year. The momentum continues to build. Big Data as a concept is characterized by 3Vs: Volume, Velocity, and Variety. Big Data implies a huge amount of data. Due to the sheer size, Big Data tends to be clumsy. The dominating implementation solution is Hadoop, which is batch based. Not just a handful of companies in the market merely collect lots of data with noise blindly, but they don’t know how to cleanse it, let alone how to transform, store and consume it effectively. They simply set up a HDFS cluster to dump the data gathered and then label it as their “Big Data” solution. Unfortunately, the consequence of what they did actually marks the death of Big Data. Collecting a lot of data is literally useless, if the data is not properly utilized. The key is the systematic exploration of the data with a right set of questions. For instance, is the data uniform or irregular? Is there a significant amount of variation in the data set? Is it buried in a mass of other irrelevant information? Can it be easily extracted and transformed? Is data collected from wordpress based websites? Is it possible to load the data at a reasonable speed? Can it be thoroughly analyzed? Can powerful insights be garnered? Otherwise, Big Data alone in an old style is really obsolete, and there are substitutes. One trend is Fast Data, which is the processing of massive data in real time to gain instant awareness and detect signals of interest on the spot. Stream data processing like Storm makes it easy to instantaneously process unbounded streams of data reliably. In-memory processing like the Spark cluster performs 100x faster than MapReduce. Another movement is Actionable Data, which synthesizes the predictive analytics and what-if analysis to prescribe recommendations to enable you to take actions with feedbacks. Social analytics, as an example, empower businesses to distill the meaning and hidden values behind the reams of social data and activities, in order to glean actionable insights. A new shift is Relevant Data. Data relationship is critical to identify pertinence in the data set, which leads to deeper understanding of seemingly unrelated events and sequence. The focus needs to analyze a vast amount of data from numerous sources and contextualize each bit of data with its own specific semantics. For example, linking together various activities and happening through data helps increase the transparency of an existing process, improve the effectiveness of the procedure, or develop new capabilities to enhance the next set of outcomes. The other direction is Smart Data. Meaning-based computing and cognitive analytics make solutions intelligent and self-improving. Knowledgeable reasoning results in more sound decisions. For instance, intelligent content personalization leverages all the data that are accumulated from B2C and social channels, to not only optimize the content display, but also heighten the user experience. All in all, Fast Data, Actionable Data, Relevant Data, and Smart Data (FARS) are well poised today to replace Big Data for the new paradigm.