Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Incorporating MapReduce in the Analytics Environment
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Software > Hadoop > Incorporating MapReduce in the Analytics Environment
AnalyticsHadoopMapReduce

Incorporating MapReduce in the Analytics Environment

BillFranks
BillFranks
6 Min Read
SHARE

A few weeks ago, I attended the Hadoop World show in New York to hear first-hand how organizations are making use of the new technology.

Contents
Scenario 1: A MapReduce platform as a “live” archiveScenario 2: A MapReduce platform as an ETL and filtering platformScenario 3: A MapReduce platform as an exploration engine

A few weeks ago, I attended the Hadoop World show in New York to hear first-hand how organizations are making use of the new technology. In future postings, I may address what claims I thought entered the hype zone and what value propositions seemed weak. However, I want to focus here on three specific cases where a MapReduce platform, such as Hadoop, clearly has an important and valuable role to play. These three examples alone are enough to justify looking at incorporating MapReduce into your analytics environment.

I am going to use the generic term MapReduce in this column. Hadoop is an open source implementation of the MapReduce framework, but there are others such as Teradata’s Aster Data offering as well. The important part for our purposes here is what a MapReduce framework can do, not which specific implementation you choose to utilize.

Scenario 1: A MapReduce platform as a “live” archive

This is a theme I heard repeatedly, and it was explicitly mentioned by speakers from show sponsor Cloudera. Consider a huge mass of historical data that rarely needs to be accessed. Traditionally, such data would be archived to tape or some other media and sent to storage (if it wasn’t simply deleted). While in theory the data could be accessed if needed, it is difficult and expensive to recover the data. In practice, people rarely leverage the archives.

More Read

Twitter Words Association Analysis
Business Analytics in 2014: Trends and Possibilities
A demo of Swype, a new innovation in text entry that recently…
Health Care Analytics Market Will Grow Considerably in 2014
Four Steps to Success with Big Data

With disk space being so cheap, the data can now be sent to the inexpensive, commodity hardware within a MapReduce platform. The data is still accessible at any time and is a “live” archive. Perhaps few users will use the data over time, but when they need to, they can. MapReduce is an inexpensive way to enable such an archive. It wouldn’t make sense to keep this archive live in a more expensive, formal data warehousing environment.

Scenario 2: A MapReduce platform as an ETL and filtering platform

One of the biggest challenges with big data sources such as web logs, sensor data, or even masses of email messages, is the process of extracting the key pieces of valuable information from the noise. Loading raw web logs into a database system to then throw away 90% or more of the data during processing isn’t the best way to go. Loading large, raw files into a MapReduce environment for initial processing makes terrific sense.

A MapReduce platform can be used to read in the raw data, apply appropriate filters and logic against it, and output a more structured, usable set of data. That

reduced set of data can then be further analyzed in the MapReduce environment or migrated into a traditional analysis environment. The key is that only the important pieces of data remain, which makes it much more manageable. Typically, only a small percentage of a raw big data feed is required for a given business problem. MapReduce is a great tool to extract those pieces.

Scenario 3: A MapReduce platform as an exploration engine

Another recurrent theme at the show was the concept of a MapReduce platform being used for discovery and exploratory analysis. This is another solid application for MapReduce. Once raw data has been read and processed, further analysis can be done against the data within the MapReduce environment. As always, many paths of analysis may be tried before a successful one is found. Once the data is in a MapReduce setting, utilizing tools to analyze it where it already sits makes sense.

This scenario leads to a major decision point, however. Once a set of data is found to have high value via analysis in MapReduce, an important next step is to combine the new data with existing data. This is so that each data source can be made even more valuable by being combined with the others. Once you have distilled the data down to what is important, it should be loaded into the corporate systems that users have wide access to. It doesn’t make sense to pull all of the data out of a data warehouse, for example, just to match it with one new source of big data. It makes more sense to load the one new source of data alongside all the other pre-existing data within a data warehouse.

That last point is one where those loyal to MapReduce may differ with me. Many discussions at Hadoop World suggested that it does make sense to pull all corporate data into MapReduce. I predict that in the long run, however, things will go as I suggest above. Keeping data movement to a minimum is essential, as is making it available to as wide an audience as possible. For these reasons, MapReduce environments will augment, rather than replace, traditional environments.

Share This Article
Facebook Pinterest LinkedIn
Share
ByBillFranks
Follow:
Bill Franks is Chief Analytics Officer for The International Institute For Analytics (IIA). Franks is also the author of Taming The Big Data Tidal Wave and The Analytics Revolution. His work has spanned clients in a variety of industries for companies ranging in size from Fortune 100 companies to small non-profit organizations. You can learn more at http://www.bill-franks.com.

Follow us on Facebook

Latest News

sales and data analytics
How Data Analytics Improves Lead Management and Sales Results
Analytics Big Data Exclusive
ai in marketing
How AI and Smart Platforms Improve Email Marketing
Artificial Intelligence Exclusive Marketing
AI Document Verification for Legal Firms: Importance & Top Tools
AI Document Verification for Legal Firms: Importance & Top Tools
Artificial Intelligence Exclusive
AI supply chain
AI Tools Are Strengthening Global Supply Chains
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Data Mining Improved Company’s Revenue By 187%

5 Min Read

How the Consumerization of Data Leads to Additional Quality of Life Improvements

4 Min Read

Business Analytics: Correlation is Not Causation

4 Min Read
IT talent
Best PracticesBig DataBusiness IntelligenceCulture/LeadershipInside CompaniesITJobsPolicy and GovernanceText Analytics

The Fundamental Techie

4 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive
ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?