Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics for pharmacy trends
    How Data Analytics Is Tracking Trends in the Pharmacy Industry
    5 Min Read
    car expense data analytics
    Data Analytics for Smarter Vehicle Expense Management
    10 Min Read
    image fx (60)
    Data Analytics Driving the Modern E-commerce Warehouse
    13 Min Read
    big data analytics in transporation
    Turning Data Into Decisions: How Analytics Improves Transportation Strategy
    3 Min Read
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Incorporating MapReduce in the Analytics Environment
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Software > Hadoop > Incorporating MapReduce in the Analytics Environment
AnalyticsHadoopMapReduce

Incorporating MapReduce in the Analytics Environment

BillFranks
BillFranks
6 Min Read
SHARE

A few weeks ago, I attended the Hadoop World show in New York to hear first-hand how organizations are making use of the new technology.

Contents
  • Scenario 1: A MapReduce platform as a “live” archive
  • Scenario 2: A MapReduce platform as an ETL and filtering platform
  • Scenario 3: A MapReduce platform as an exploration engine

A few weeks ago, I attended the Hadoop World show in New York to hear first-hand how organizations are making use of the new technology. In future postings, I may address what claims I thought entered the hype zone and what value propositions seemed weak. However, I want to focus here on three specific cases where a MapReduce platform, such as Hadoop, clearly has an important and valuable role to play. These three examples alone are enough to justify looking at incorporating MapReduce into your analytics environment.

I am going to use the generic term MapReduce in this column. Hadoop is an open source implementation of the MapReduce framework, but there are others such as Teradata’s Aster Data offering as well. The important part for our purposes here is what a MapReduce framework can do, not which specific implementation you choose to utilize.

Scenario 1: A MapReduce platform as a “live” archive

This is a theme I heard repeatedly, and it was explicitly mentioned by speakers from show sponsor Cloudera. Consider a huge mass of historical data that rarely needs to be accessed. Traditionally, such data would be archived to tape or some other media and sent to storage (if it wasn’t simply deleted). While in theory the data could be accessed if needed, it is difficult and expensive to recover the data. In practice, people rarely leverage the archives.

More Read

Big Leap Forward: Analytics Keynote at UK & Ireland SAP User Group Conference 2011
What’s Next – Predictive Scores for Healthcare?
The Hospitality Industry Benefits From the Emergence of Big Data
How Data Analytics can Help you Bolster Your Career Performance?
R Integrated Throughout the Enterprise Analytics Stack

With disk space being so cheap, the data can now be sent to the inexpensive, commodity hardware within a MapReduce platform. The data is still accessible at any time and is a “live” archive. Perhaps few users will use the data over time, but when they need to, they can. MapReduce is an inexpensive way to enable such an archive. It wouldn’t make sense to keep this archive live in a more expensive, formal data warehousing environment.

Scenario 2: A MapReduce platform as an ETL and filtering platform

One of the biggest challenges with big data sources such as web logs, sensor data, or even masses of email messages, is the process of extracting the key pieces of valuable information from the noise. Loading raw web logs into a database system to then throw away 90% or more of the data during processing isn’t the best way to go. Loading large, raw files into a MapReduce environment for initial processing makes terrific sense.

A MapReduce platform can be used to read in the raw data, apply appropriate filters and logic against it, and output a more structured, usable set of data. That

reduced set of data can then be further analyzed in the MapReduce environment or migrated into a traditional analysis environment. The key is that only the important pieces of data remain, which makes it much more manageable. Typically, only a small percentage of a raw big data feed is required for a given business problem. MapReduce is a great tool to extract those pieces.

Scenario 3: A MapReduce platform as an exploration engine

Another recurrent theme at the show was the concept of a MapReduce platform being used for discovery and exploratory analysis. This is another solid application for MapReduce. Once raw data has been read and processed, further analysis can be done against the data within the MapReduce environment. As always, many paths of analysis may be tried before a successful one is found. Once the data is in a MapReduce setting, utilizing tools to analyze it where it already sits makes sense.

This scenario leads to a major decision point, however. Once a set of data is found to have high value via analysis in MapReduce, an important next step is to combine the new data with existing data. This is so that each data source can be made even more valuable by being combined with the others. Once you have distilled the data down to what is important, it should be loaded into the corporate systems that users have wide access to. It doesn’t make sense to pull all of the data out of a data warehouse, for example, just to match it with one new source of big data. It makes more sense to load the one new source of data alongside all the other pre-existing data within a data warehouse.

That last point is one where those loyal to MapReduce may differ with me. Many discussions at Hadoop World suggested that it does make sense to pull all corporate data into MapReduce. I predict that in the long run, however, things will go as I suggest above. Keeping data movement to a minimum is essential, as is making it available to as wide an audience as possible. For these reasons, MapReduce environments will augment, rather than replace, traditional environments.

Share This Article
Facebook Pinterest LinkedIn
Share
ByBillFranks
Follow:
Bill Franks is Chief Analytics Officer for The International Institute For Analytics (IIA). Franks is also the author of Taming The Big Data Tidal Wave and The Analytics Revolution. His work has spanned clients in a variety of industries for companies ranging in size from Fortune 100 companies to small non-profit organizations. You can learn more at http://www.bill-franks.com.

Follow us on Facebook

Latest News

intersection of data and patient care
How Healthcare Careers Are Expanding at the Intersection of Data and Patient Care
Big Data Exclusive
dedicated servers for ai businesses
5 Reasons AI-Driven Business Need Dedicated Servers
Artificial Intelligence Exclusive News
data analytics for pharmacy trends
How Data Analytics Is Tracking Trends in the Pharmacy Industry
Analytics Big Data Exclusive
ai call centers
Using Generative AI Call Center Solutions to Improve Agent Productivity
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Top 10 Data Management Issues for 2009

17 Min Read

Marketing Optimization with LityxIQ

3 Min Read
Phocas business intelligence software
AnalyticsBusiness IntelligenceDecision ManagementExclusive

What to Look For When Choosing a Business Intelligence Solution

5 Min Read

Critics of carbon capture and storage (CCS) often deride the…

2 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?