Considering the Data Diet

September 2, 2009
154 Views

I had the unique pleasure of meeting with Pete Fader and Eric Bradlow from the Wharton Interactive Media Initiative last week in Philadelphia. And while it wasn’t the point of the meeting I was really interested in their work about companies going on a data diet. While this is somewhat antithetical to the historical position of data warehousing – where every bit of data should be kept for as long as possible, I think they are doing important work in this space.

Not because I think companies need less data, but because they are going to be getting more and more data. With the emerging Web Squared environment where world wide web meshes with devices, sensors, geospatial and temporal data and the devices interact with this mesh to transform not just the consumer experience but the entire value chain. The volume of data that can be analyzed is not going to be nearly as important as the critical data to the problem you are trying to solve.

As Melyssa Plunket-Gomez from Crimson Hexagon showed me with Social Media data, a lot of what needs to be analyzed is what you exclude. When looking at blog traffic for sentiment about a company you have exclude stuff like job postings because they .

I had the unique pleasure of meeting with Pete Fader and Eric Bradlow from the Wharton Interactive Media Initiative last week in Philadelphia. And while it wasn’t the point of the meeting I was really interested in their work about companies going on a data diet. While this is somewhat antithetical to the historical position of data warehousing – where every bit of data should be kept for as long as possible, I think they are doing important work in this space.

Not because I think companies need less data, but because they are going to be getting more and more data. With the emerging Web Squared environment where world wide web meshes with devices, sensors, geospatial and temporal data and the devices interact with this mesh to transform not just the consumer experience but the entire value chain. The volume of data that can be analyzed is not going to be nearly as important as the critical data to the problem you are trying to solve.

As Melyssa Plunket-Gomez from Crimson Hexagon showed me with Social Media data, a lot of what needs to be analyzed is what you exclude. When looking at blog traffic for sentiment about a company you have exclude stuff like job postings because they are irrelevant to the analysis even though they have all the right keywords. I think this is a perfect example of watching what you analyze – a critical part of any data diet.

And while I think the data diet statement can be too simple, I think the concept has an important role to play in how companies utilize tools like Teradata’s multi-temperature data warehousing solutions. Which data is hot and needs high performance disks and chips and which data is not needed all the time and can be moved to lower cost storage? A data diet approach can be useful in determining how to invest in data.

One of the challenges I see for data geeks in the coming years is how to decide what data you need to answer the questions that need to be answered tomorrow? I don’t don’t know what that question is and I don’t know what data I need. There is a big cost to not having the information you need when you need it. A data diet risks summarizing the value out of the data and eliminating data you need, but you don’t know it yet. Sometimes the most expensive data is the data you threw away.

Paul Barrett

—————
A couple follow-ups to earlier posts:

Consumers Taking Control > The district manager of the book store that didn’t get my money invited me to come to the store and meet and have a coffee. I am looking forward to it – I will probably make him spring for an expensive mocha instead of my usual value priced Iced Double Espresso – under $2 almost anywhere.

Millenials have never lived in a Batch World > I am getting ready for my presentation “Customer Time is Now Real-Time” at the Teradata Partners conference, coming up in October. This presentation is about how the confluence of the real-time web and a young generation of consumers are changing the rules for businesses. If you have comments or thoughts on Millenials or the real-time web and how it is impacting your business, please drop me a note at paul.barrett@teradata.com 

Link to original post