Big Data. New Physics.

November 19, 2010
126 Views

I spoke at Defrag 2010 earlier today and introduced what I am describing as the new physics of big data.

Having designed and deployed a number of multi-billion row context accumulating systems over the last 14 years I cannot help but notice some very interesting, very exciting phenomenology.  Not research.  Not Theory.  Real.

I spoke at Defrag 2010 earlier today and introduced what I am describing as the new physics of big data.

Having designed and deployed a number of multi-billion row context accumulating systems over the last 14 years I cannot help but notice some very interesting, very exciting phenomenology.  Not research.  Not Theory.  Real.

1. Better Prediction.  Simultaneously lower false positives and lower false negatives.  A bit more about this here: Prediction: Channel Consolidation and Puzzling: How Observations Are Accumulated Into Context. 

2. Bad data good.  More specifically, natural variability in data including spelling errors, transposition errors, and even professionally fabricated lies – all helpful.  A bit more about this here: It Turns Out Both Bad Data and a Teaspoon of Dirt May Be Good For You and There Is No Such Thing As A Single Version of Truth.

3. More data faster.  Less compute effort as the database gets bigger.  A bit more about this most exciting phenomenon here: The Fast Last Puzzle Piece.

4. Selective attention and curiosity.  A better sense of when and where to place one’s attention (apply compute effort) including: (1) very smart observation filters and (2) fully automated ability to determine very specific, very relevant questions – answers to which it may decide to fetch itself.  A system that Googles itself?  Unfortunately, I have not blogged about this thinking yet, but will hopefully get to it one of these days.

Anyway, imagine that: As the database grows, fewer CPU cycles are needed for better predictions and you never really wanted to clean all that data up in the first place.

I also took this keynote opportunity to share my latest skunk works project – a project my team and I have been working on for almost two years now.  Yes, it’s true, I am building something – a sensemaking engine, designed to fully harness this big data phenomenon.  Among other exciting properties, this system will also have an unprecedented number of privacy-enhancing features baked into it.  Internally I have been calling this little skunk works effort “G2.”  And when this little girl grows up I have big hopes for her.  For example, maybe she will help cancer researchers find a cure.

My Defrag 2010 MS PowerPoint presentation here.