Big Data Analytics and Cloud Analytics: Embracing the Cloud and the Big Data Grid

July 10, 2011
297 Views

Sometimes you have a choice.  You either jump in and join the game, or sit on the sidelines and watch other teams play.  Core Analytics and Game Loyalty are joining in the Cloud game.  Here’s a preliminary introduction to what’s currently going on in this space.  (Saturday is a good day to go through all the professional magazines and newspapers that have accumulated, before recycling).

Sometimes you have a choice.  You either jump in and join the game, or sit on the sidelines and watch other teams play.  Core Analytics and Game Loyalty are joining in the Cloud game.  Here’s a preliminary introduction to what’s currently going on in this space.  (Saturday is a good day to go through all the professional magazines and newspapers that have accumulated, before recycling).

I’ve asked the question many times over the past year: what exactly is cloud computing and how does it differ from current systems. (I’m still under the impression there’s a bit of rebranding and productizing something that’s been around for a while going on.)  Cloud computing involves using multiple server computers on a digital network, to centralize all enterprise applications.  Not using a cloud, you would be using a single web server and with a cloud you are able to use many web servers, by renting extensive shared server space.  The advantage of the cloud is that the work load is optimized, space is unlimited, and you pay storage rent based on usage.  Essentially, this computing and storage sharing system appears to operate similarly to the energy grid, (which is a series of independently owned and operated plant and transmission lines) where distributed data flow and processing power is analogous to electric power transmission .

Another related term is Big Data.  It’s appropriately named, these are datasets commonly generated by weblogs, social networks, social data and they comprise large quantities of data that are beyond the ability of most existing software and technology to efficiently store, manage and analyze.  New technologies, many of them open source, are emerging to handle these massive amounts of data.  Some of the current leaders in the space are MPP databases, data grids, HPCC/ECL, Hadoop, and MapReduce technologies, scalable storage solutions, and cloud computing.

The June 27, 2011 issue of InformationWeek features an article in the analytics section entitled ‘Ballmer Defines ‘Big Data’ in Microsoft’s Terms’.  To summarize the article, MSFT is making ‘Big Data’ a top priority, and they are approaching this from a different angle than Oracle (benefiting from their acquisition of SUN Systems and bundling data warehousing) and IBM (that has a grid-based NAS solution).  Ballmer is focusing on how Big Data and cloud computing combine in an on-premises data and Microsofts’ online data centers.  Microsoft is a relatively late entry to the Big Data game, with it’s SQL Server R2 Parallel Data Warehouse introduction in late 2010.  Ballmer says ‘Nobody plays in big data, really, except Microsoft and Google’.  What he’s referring to here are the Bing and Google search engines.  The search engines rely on infrastructure that easily manages petabytes of data daily.  Bing and Google-related processing is far more powerful because it can scale up to petabyte-style deployments.  The real value is in leveraging the data for Business Intelligence and Analytics.  Nielson and Acxiom are other leaders in the cloud-computing and big data analytics space.  Here’s an interesting NoSQL conference coming up in San Jose at the end of August related to the Big Data topic.