By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics in sports industry
    Here’s How Data Analytics In Sports Is Changing The Game
    6 Min Read
    data analytics on nursing career
    Advances in Data Analytics Are Rapidly Transforming Nursing
    8 Min Read
    data analytics reveals the benefits of MBA
    Data Analytics Technology Proves Benefits of an MBA
    9 Min Read
    data-driven image seo
    Data Analytics Helps Marketers Substantially Boost Image SEO
    8 Min Read
    construction analytics
    5 Benefits of Analytics to Manage Commercial Construction
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Incorporating MapReduce in the Analytics Environment
Share
Notification Show More
Latest News
big data mac performance
Data-Driven Tips to Optimize the Speed of Macs
News
3 Ways AI Has Helped Marketers and Creative Professionals Streamline Workflows
3 Ways AI Has Helped Marketers and Creative Professionals Streamline Workflows
Artificial Intelligence
data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Software > Hadoop > Incorporating MapReduce in the Analytics Environment
AnalyticsHadoopMapReduce

Incorporating MapReduce in the Analytics Environment

BillFranks
Last updated: 2011/12/06 at 1:34 PM
BillFranks
6 Min Read
SHARE

A few weeks ago, I attended the Hadoop World show in New York to hear first-hand how organizations are making use of the new technology.

Contents
Scenario 1: A MapReduce platform as a “live” archiveScenario 2: A MapReduce platform as an ETL and filtering platformScenario 3: A MapReduce platform as an exploration engine

A few weeks ago, I attended the Hadoop World show in New York to hear first-hand how organizations are making use of the new technology. In future postings, I may address what claims I thought entered the hype zone and what value propositions seemed weak. However, I want to focus here on three specific cases where a MapReduce platform, such as Hadoop, clearly has an important and valuable role to play. These three examples alone are enough to justify looking at incorporating MapReduce into your analytics environment.

I am going to use the generic term MapReduce in this column. Hadoop is an open source implementation of the MapReduce framework, but there are others such as Teradata’s Aster Data offering as well. The important part for our purposes here is what a MapReduce framework can do, not which specific implementation you choose to utilize.

Scenario 1: A MapReduce platform as a “live” archive

This is a theme I heard repeatedly, and it was explicitly mentioned by speakers from show sponsor Cloudera. Consider a huge mass of historical data that rarely needs to be accessed. Traditionally, such data would be archived to tape or some other media and sent to storage (if it wasn’t simply deleted). While in theory the data could be accessed if needed, it is difficult and expensive to recover the data. In practice, people rarely leverage the archives.

More Read

data analytics in sports industry

Here’s How Data Analytics In Sports Is Changing The Game

Advances in Data Analytics Are Rapidly Transforming Nursing
Data Analytics Technology Proves Benefits of an MBA
Data Analytics Helps Marketers Substantially Boost Image SEO
5 Benefits of Analytics to Manage Commercial Construction

With disk space being so cheap, the data can now be sent to the inexpensive, commodity hardware within a MapReduce platform. The data is still accessible at any time and is a “live” archive. Perhaps few users will use the data over time, but when they need to, they can. MapReduce is an inexpensive way to enable such an archive. It wouldn’t make sense to keep this archive live in a more expensive, formal data warehousing environment.

Scenario 2: A MapReduce platform as an ETL and filtering platform

One of the biggest challenges with big data sources such as web logs, sensor data, or even masses of email messages, is the process of extracting the key pieces of valuable information from the noise. Loading raw web logs into a database system to then throw away 90% or more of the data during processing isn’t the best way to go. Loading large, raw files into a MapReduce environment for initial processing makes terrific sense.

A MapReduce platform can be used to read in the raw data, apply appropriate filters and logic against it, and output a more structured, usable set of data. That

reduced set of data can then be further analyzed in the MapReduce environment or migrated into a traditional analysis environment. The key is that only the important pieces of data remain, which makes it much more manageable. Typically, only a small percentage of a raw big data feed is required for a given business problem. MapReduce is a great tool to extract those pieces.

Scenario 3: A MapReduce platform as an exploration engine

Another recurrent theme at the show was the concept of a MapReduce platform being used for discovery and exploratory analysis. This is another solid application for MapReduce. Once raw data has been read and processed, further analysis can be done against the data within the MapReduce environment. As always, many paths of analysis may be tried before a successful one is found. Once the data is in a MapReduce setting, utilizing tools to analyze it where it already sits makes sense.

This scenario leads to a major decision point, however. Once a set of data is found to have high value via analysis in MapReduce, an important next step is to combine the new data with existing data. This is so that each data source can be made even more valuable by being combined with the others. Once you have distilled the data down to what is important, it should be loaded into the corporate systems that users have wide access to. It doesn’t make sense to pull all of the data out of a data warehouse, for example, just to match it with one new source of big data. It makes more sense to load the one new source of data alongside all the other pre-existing data within a data warehouse.

That last point is one where those loyal to MapReduce may differ with me. Many discussions at Hadoop World suggested that it does make sense to pull all corporate data into MapReduce. I predict that in the long run, however, things will go as I suggest above. Keeping data movement to a minimum is essential, as is making it available to as wide an audience as possible. For these reasons, MapReduce environments will augment, rather than replace, traditional environments.

BillFranks December 6, 2011
Share this Article
Facebook Twitter Pinterest LinkedIn
Share
By BillFranks
Follow:
Bill Franks is Chief Analytics Officer for The International Institute For Analytics (IIA). Franks is also the author of Taming The Big Data Tidal Wave and The Analytics Revolution. His work has spanned clients in a variety of industries for companies ranging in size from Fortune 100 companies to small non-profit organizations. You can learn more at http://www.bill-franks.com.

Follow us on Facebook

Latest News

big data mac performance
Data-Driven Tips to Optimize the Speed of Macs
News
3 Ways AI Has Helped Marketers and Creative Professionals Streamline Workflows
3 Ways AI Has Helped Marketers and Creative Professionals Streamline Workflows
Artificial Intelligence
data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

data analytics in sports industry
Big Data

Here’s How Data Analytics In Sports Is Changing The Game

6 Min Read
data analytics on nursing career
Analytics

Advances in Data Analytics Are Rapidly Transforming Nursing

8 Min Read
data analytics reveals the benefits of MBA
Analytics

Data Analytics Technology Proves Benefits of an MBA

9 Min Read
data-driven image seo
Analytics

Data Analytics Helps Marketers Substantially Boost Image SEO

8 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence
AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?