Big Data: It’s About the Data, Not About the Big

Today in my local paper there was an article on the 17^th annual Knowledge Discovery and Data Mining conference. The gist was about the need for data analytics and how companies are looking for good people that understand that realm. Of course the driving force behind all this was the movement in “BIG DATA”

It is a little amusing to those of us that have been in the data warehousing game a long time, especially with Teradata, to hear the term BIG DATA. At the start of Teradata, people openly wondered if anyone would ever have a need to handle a terabyte of data, and yet today there is talk of petabytes and beyond with hardly a blink of an eye.

Editor’s note: Rob Armstrong is an employee of Teradata. Teradata is a sponsor of The Smart Data Collective.

Big data is not about volume, though it may certainly be voluminous. The accent is really on the second part, the data. The term applies to many new and varied data sources such as RFID, Geospatial, Smart meters, social network data, and web logs. Yes this data can be big but more importantly it provides a source of very rich analytical opportunity.

You can think of this is a few ways. There is data such as smart metering for a utility company. Getting a discrete usage tracking at 15 minute intervals will certainly cause much more data coming into the environment. As opposed to a once a month recording this would be approximately a 3000 times jump (96 a day for 30 days rather than a 1 row per month)! But the data is still the same structure and very “relationally friendly”. The data fits the existing model and can enable much better analytics that leads to network optimization, marketing new rates and services, and integration into home appliances. So here the data is big but easy to handle, provide you have the technology and scalable environment to allow that type of growth

The other side of the coin is data that is not “relationally friendly”. This may be data from social sites, call center recordings, customer feedback forms, or the like. Here you have free text (or even txt, LOL J) The data needs to be interrogated for content and context and then prepared for integration into the relational model. This is where the “big data” creates a whole new level of data transformation need. Here is where environments such as Hadoop or Aster data come into play. Being able analysis the data streams with many to many relationships creates some good insights. The goal of the KDD conference is finding people and processes that can make sense of the data or can at least figure out processes that allow the data to make sense.

Once you have processes the “BIG DATA” and gleaned some insights from the data streams then you also want to integrate that data into your corporate data structure. Have tweets about a product you sell? Link those tweets into the data model and join based on product ID. Getting comments from facebook users on your service. Link that sentiment into your data model based on customer attributes such as e-mail address or facebook ID.

The real point of all this data is that by itself it is just “BIG DATA”, but when analyzed and integrated into a wider perspective it becomes “BIG OPPORTUNITY”.

[Rob Armstrong works for Teradata, sponsor of Smart Data Collective]