Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    media monitoring
    Signals In The Noise: Using Media Monitoring To Manage Negative Publicity
    5 Min Read
    data analytics
    How Data Analytics Can Help You Construct A Financial Weather Map
    4 Min Read
    financial analytics
    Financial Analytics Shows The Hidden Cost Of Not Switching Systems
    4 Min Read
    warehouse accidents
    Data Analytics and the Future of Warehouse Safety
    10 Min Read
    stock investing and data analytics
    How Data Analytics Supports Smarter Stock Trading Strategies
    4 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Incorporating MapReduce in the Analytics Environment
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Software > Hadoop > Incorporating MapReduce in the Analytics Environment
AnalyticsHadoopMapReduce

Incorporating MapReduce in the Analytics Environment

BillFranks
BillFranks
6 Min Read
SHARE

A few weeks ago, I attended the Hadoop World show in New York to hear first-hand how organizations are making use of the new technology.

Contents
  • Scenario 1: A MapReduce platform as a “live” archive
  • Scenario 2: A MapReduce platform as an ETL and filtering platform
  • Scenario 3: A MapReduce platform as an exploration engine

A few weeks ago, I attended the Hadoop World show in New York to hear first-hand how organizations are making use of the new technology. In future postings, I may address what claims I thought entered the hype zone and what value propositions seemed weak. However, I want to focus here on three specific cases where a MapReduce platform, such as Hadoop, clearly has an important and valuable role to play. These three examples alone are enough to justify looking at incorporating MapReduce into your analytics environment.

I am going to use the generic term MapReduce in this column. Hadoop is an open source implementation of the MapReduce framework, but there are others such as Teradata’s Aster Data offering as well. The important part for our purposes here is what a MapReduce framework can do, not which specific implementation you choose to utilize.

Scenario 1: A MapReduce platform as a “live” archive

This is a theme I heard repeatedly, and it was explicitly mentioned by speakers from show sponsor Cloudera. Consider a huge mass of historical data that rarely needs to be accessed. Traditionally, such data would be archived to tape or some other media and sent to storage (if it wasn’t simply deleted). While in theory the data could be accessed if needed, it is difficult and expensive to recover the data. In practice, people rarely leverage the archives.

More Read

data and seo
Maximize SEO Success with Powerful Data Analytics Insights
Know Me and Be Relevant: What I learned from Disney’s Keynote at NCDM
Analytics Ascendant: Will Predictive Modeling Replace All Other Ways of “Knowing” Customers?
#11: Here’s a thought…
First Look – Netuitive

With disk space being so cheap, the data can now be sent to the inexpensive, commodity hardware within a MapReduce platform. The data is still accessible at any time and is a “live” archive. Perhaps few users will use the data over time, but when they need to, they can. MapReduce is an inexpensive way to enable such an archive. It wouldn’t make sense to keep this archive live in a more expensive, formal data warehousing environment.

Scenario 2: A MapReduce platform as an ETL and filtering platform

One of the biggest challenges with big data sources such as web logs, sensor data, or even masses of email messages, is the process of extracting the key pieces of valuable information from the noise. Loading raw web logs into a database system to then throw away 90% or more of the data during processing isn’t the best way to go. Loading large, raw files into a MapReduce environment for initial processing makes terrific sense.

A MapReduce platform can be used to read in the raw data, apply appropriate filters and logic against it, and output a more structured, usable set of data. That

reduced set of data can then be further analyzed in the MapReduce environment or migrated into a traditional analysis environment. The key is that only the important pieces of data remain, which makes it much more manageable. Typically, only a small percentage of a raw big data feed is required for a given business problem. MapReduce is a great tool to extract those pieces.

Scenario 3: A MapReduce platform as an exploration engine

Another recurrent theme at the show was the concept of a MapReduce platform being used for discovery and exploratory analysis. This is another solid application for MapReduce. Once raw data has been read and processed, further analysis can be done against the data within the MapReduce environment. As always, many paths of analysis may be tried before a successful one is found. Once the data is in a MapReduce setting, utilizing tools to analyze it where it already sits makes sense.

This scenario leads to a major decision point, however. Once a set of data is found to have high value via analysis in MapReduce, an important next step is to combine the new data with existing data. This is so that each data source can be made even more valuable by being combined with the others. Once you have distilled the data down to what is important, it should be loaded into the corporate systems that users have wide access to. It doesn’t make sense to pull all of the data out of a data warehouse, for example, just to match it with one new source of big data. It makes more sense to load the one new source of data alongside all the other pre-existing data within a data warehouse.

That last point is one where those loyal to MapReduce may differ with me. Many discussions at Hadoop World suggested that it does make sense to pull all corporate data into MapReduce. I predict that in the long run, however, things will go as I suggest above. Keeping data movement to a minimum is essential, as is making it available to as wide an audience as possible. For these reasons, MapReduce environments will augment, rather than replace, traditional environments.

Share This Article
Facebook Pinterest LinkedIn
Share
ByBillFranks
Follow:
Bill Franks is Chief Analytics Officer for The International Institute For Analytics (IIA). Franks is also the author of Taming The Big Data Tidal Wave and The Analytics Revolution. His work has spanned clients in a variety of industries for companies ranging in size from Fortune 100 companies to small non-profit organizations. You can learn more at http://www.bill-franks.com.

Follow us on Facebook

Latest News

NO-CODE
Breaking down SPARC Emulation Technology: Zero Code Re-write
Exclusive News Software
online business using analytics
Why Some Businesses Seem to Win Online Without Ever Feeling Like They Are Trying
Exclusive News
edi compliance with AI
AI Is Transforming EDI Compliance Services
Exclusive News
companies using big data
5 Industries Driving Big Data Technology Growth
Big Data Exclusive

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

energy data analytics
AnalyticsBig DataExclusivePredictive Analytics

IBM Emphasizes The Benefits Of Data Analytics For Renewable Energy

7 Min Read
data modeling tools to analyze
Modeling

Top 10 Powerful Data Modeling Tools For 2021

8 Min Read

Technologies are being developed that enable tiny computing…

1 Min Read

Data Presentation: A Picture is Worth Far More Than Words

3 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?