Big Data and Real-time Structured Data Analytics -…

August 13, 2009
104 Views

Big Data and Real-time Structured Data Analytics – O’Reilly Radar

The emergence of sensors as sources of Big Data highlights the need for real-time analytic tools. Popular web apps like Twitter, Facebook, and blogs are also faced with having to analyze (mostly unstructured) data in near real-time. But as Truviso founder and UC Berkeley CS Professor Michael Franklin recently noted, there are mountains of structured data generated by web apps that lend themselves to real-time analysis: The information stream driving the data analytics challenge is orders of magnitude larger than the streams of tweets, blog posts, etc. that are driving interest in searching the real-time web. Most tweets, for example, are created manually by people at keyboards or touchscreens, 140 characters at a time. Multiply that by the millions of active users and the result is indeed an impressive amount of information. The data driving the data analytics tsunami, on the other hand, is automatically generated. Every page view, ad impression, ad click, video view, etc. done by every user on the web generates thousands of bytes of log information. Add in the data automatically generated by the underlying


Big Data and Real-time Structured Data Analytics – O’Reilly Radar

The emergence of sensors as sources of Big Data highlights the need for real-time analytic tools. Popular web apps like Twitter, Facebook, and blogs are also faced with having to analyze (mostly unstructured) data in near real-time. But as Truviso founder and UC Berkeley CS Professor Michael Franklin recently noted, there are mountains of structured data generated by web apps that lend themselves to real-time analysis: The information stream driving the data analytics challenge is orders of magnitude larger than the streams of tweets, blog posts, etc. that are driving interest in searching the real-time web. Most tweets, for example, are created manually by people at keyboards or touchscreens, 140 characters at a time. Multiply that by the millions of active users and the result is indeed an impressive amount of information. The data driving the data analytics tsunami, on the other hand, is automatically generated. Every page view, ad impression, ad click, video view, etc. done by every user on the web generates thousands of bytes of log information. Add in the data automatically generated by the underlying infrastructure (CDNs, servers, gateways, etc.) and you can quickly find yourself dealing with petabytes of data.

The Smarter Planet tumblelog is an outgrowth of IBM’s strategic initiative to help a world of smart systems emerge.

Link to original post

To see just the posts related to the “new intelligence” — advanced business intelligence, predictive analytics, decision support and large scale data managment — try this link:
http://smarterplanet.tumblr.com/tagged/new_intelligence

 See this primer on Smarter Planet