Experimentation, Open Source and Big Data

September 26, 2011
149 Views

An interesting article in BusinessWeek on Big Data recently caught my eye. The article mentions different applications that allow organizations to make sense of the vast–and exponentially increasing–amount of unstructured data out there. From the piece:

An interesting article in BusinessWeek on Big Data recently caught my eye. The article mentions different applications that allow organizations to make sense of the vast–and exponentially increasing–amount of unstructured data out there. From the piece:

“When the amount of data in the world increases at an exponential rate, analyzing that data and producing intelligence from it becomes very important,” says Anand Rajaraman, senior vice-president of global e-commerce at Wal-Mart and head of @WalmartLabs, the retailer’s division charged with improving its use of the Web.

More than ever, today intelligent businesses are trying to make sense of millions of tweets, blog posts, comments, reviews, and other form for unstructured data. The obvious question becomes, “How?”

I’ve written before on this site about collaborative filtering and semantic technologies. For many reasons beyond the scope of this post, companies such as Apple, Google, Facebook, and Amazon benefit from crowdsourcing and the law of large numbers much more than traditional companies. For instance, Apple knows the following about its customers who buy apps through its AppStore:

  • which customers like which apps
  • which customers buy (and like) other apps based on the purchase of the first app
  • which customers are more likely to consider buying apps in the same category

Few businesses have the same level of knowledge about their customers. Apple is the exception that proves the rule. In other words, rare is the organization with access to detailed data on millions of its customers, structured or otherwise. What’s a “normal” company to do? Is there nothing they can do but watch from the sidelines?

In a word, no. These emerging applications show tremendous promise.

Now, I won’t pretend to have intimate knowledge of each of these data-mining applications and projects. At a high level, they are designed to help large organizations interpret vast amounts of data. Clearly, developers out there have recognized the need for such applications and have built them according to what they think the market wants.

One application equipped to potentially make sense of all of this unstructured data is Hadoop, an open source development project. From its website, “The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model.” Certainly worth looking into.

The Benefits of Free

Imagine for a moment that you’re a mid-level manager in a large organization. You see the need for a tool that would help you mine data and attempt to find hidden patterns–and ultimately knowledge. Excel just doesn’t cut it. You’d love to see what Hadoop or one of its equivalents can do. Yet, you’re not about to fall on the sword for an unproven product in tight economic times.

Free and open source tools are certainly worth considering. Download Hadoop or another application and begin playing with it. Network with people online and off. Ask them questions. Noodle with different data sets and see if you learn anything about your company, its customers, and underlying trends and drivers. Worst case scenario: you waste a little time.

Simon Says

In any organization, the traditional RFP process can be extremely cumbersome. Times are tight, and it’s entirely possible that even potentially valuable projects like mining unstructured data may not get the go-ahead. What’s more, your organization may have established relationships with companies like IBM that offer proprietary applications and services in the BI space. And, to be sure, Hadoop and other open source/free tools may not meet all of your organization’s needs.

All of this is to say that open source software is no panacea in any case–and here is no exception. However, doesn’t it behoove you to see what’s out there before making the case for a large expenditure–one that may ultimately not succeed? Is there any real harm in downloading a piece of software just to see what it can do?

Feedback

What say you?