Handling The Big Data Faucet

December 2, 2011
289 Views

Between the millions of blogs, hundreds of sources of open social information streams along with the massive number of popular platforms for online conversations, companies will face a growing challenge in trying to keep up with a continuous flow of unstructured data coming out of a wide-open big data faucet.

Between the millions of blogs, hundreds of sources of open social information streams along with the massive number of popular platforms for online conversations, companies will face a growing challenge in trying to keep up with a continuous flow of unstructured data coming out of a wide-open big data faucet.

One of the biggest challenges is how information professionals approach getting a grip on this big data faucet with the goal of wrangling in only the most pertinent nuggets of insight critical to their business. Another challenge is that there are a wide range of api’s supported by these sources and when new sources come online, they to will have their own proprietary api’s.

Many developers have either created or attempted to reverse engineer the data models associated with many of the popular social networks including Twitter and Facebook that map to the data returned out their API’s. Furthermore, some of these API’s are also specific to subsets of authorized functionality, for example, the Facebook analytic API available only to authorized users.

Keeping these data models in synch can be quite challenging, especially when multiple social data sources are required to provide better overall analysis for the target business user or business. An example would be the data from Facebook, LinkedIn and Twitter. The other issue is that most of the data that is critically important from these social sources is unstructured in nature. The key to making these easier to combine is the evolution of expanded data stores that support no pre-defined schema and that support structured, unstructured and semi-structured data. These new sources of critically important unstructured data are driving this change in the industry.

Data integration tools are also changing so that they are able to hang off a variety of new social data firehoses and support unstructured solutions like Hadoop. An ETL tools JSON adapter or interface is frequently used to connect into everything from the Facebook Graph API to the Twitter API. Many data integration tool vendors have also announced support for connecting into Hadoop. The Hadoop project has also created tooling for integration. A good example is Sqoop.

Finally new front-end innovations are emerging to help business intelligence professionals overcome the challenges of analyzing the combination of structured, semi-structured and unstructured data. Just like data stores that are being enhanced to support a variety of data, front-end solutions for analyzing, visualizing and discovering information from this data are coming quickly. Finally, business intelligence professionals are quickly skilling up on these new discovery based solutions very quickly as the pressure to analyze all this data continues to bear down on them.

With all these challenges, comes great new innovation for the future of business intelligence and data management. Today, our industry is going thru a very large inflection point and change and with this change comes huge opportunities to innovate. We are now only scratching the surface of what is possible. We saw this when mainframe computing went to client/server computing and when client/server computing went the way of the web and now the web moving more and more to mobile. These shifts of the past have created opportunities and now this big data shift is doing it again.