Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    predictive analytics risk management
    How Predictive Analytics Is Redefining Risk Management Across Industries
    7 Min Read
    data analytics and gold trading
    Data Analytics and the New Era of Gold Trading
    9 Min Read
    composable analytics
    How Composable Analytics Unlocks Modular Agility for Data Teams
    9 Min Read
    data mining to find the right poly bag makers
    Using Data Analytics to Choose the Best Poly Mailer Bags
    12 Min Read
    data analytics for pharmacy trends
    How Data Analytics Is Tracking Trends in the Pharmacy Industry
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Hadoop and Spark: Better Together
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Software > Hadoop > Hadoop and Spark: Better Together
Hadoop

Hadoop and Spark: Better Together

kingmesal
kingmesal
7 Min Read
Image
SHARE

The various online reports about the end of Hadoop as a big data framework bring to mind Mark Twain’s notable quote about the reports of his demise being an exaggeration. Hadoop is very much alive, and numerous organizations continue to make it a key component of their big data and analytics initiatives.

The various online reports about the end of Hadoop as a big data framework bring to mind Mark Twain’s notable quote about the reports of his demise being an exaggeration. Hadoop is very much alive, and numerous organizations continue to make it a key component of their big data and analytics initiatives.

Image

A newer big data framework, Apache Spark, has been described as a possible replacement for Hadoop. Some

More Read

Redefining Loyalty Programs with Big Data and Hadoop
Big data, big acquisition, still some big questions
Why Enterprises Should Be More Interested in Hadoop
What Are Big Data, Hadoop and HDFS? 3 Must-Watch YouTube Videos
A Complete Guide to Overcoming Executives’ Concerns about Hadoop

view Spark as being more accessible and powerful than the older framework, and therefore more suitable for emerging big data and analytics projects.

The fact is, rather than being a replacement for Hadoop, Spark can serve as a complement to it, and Hadoop can remain a viable component of big data strategies. Spark can either run on top of Hadoop, leveraging its cluster manager and underlying storage, or separately from the framework, integrating with alternative cluster managers and storage platforms.

Hadoop now includes the YARN cluster manager, which the Apache Software Foundation refers to as MapReduce 2.0, or a complete overhaul of MapReduce. While Hadoop MapReduce can be used effectively for working with data types such as log files and static batch processes, other processing tasks can be assigned to different processing engines such as Spark). YARN would handle the management and allocation of cluster resources.

Organizations can integrate Hadoop with Spark for a number of purposes. One is for cluster administration and another is data management including business continuity.

While Spark is a general-purpose data processing engine that is suitable for a variety of projects, it’s not currently designed to handle the data management and cluster administration functions associated with running data process and analysis workloads at scale. Hadoop and its associated projects can effectively handle these tasks, however.

By integrating Spark with Hadoop, organizations can leverage many of the Hadoop capabilities that production environments require, such as YARN resource manager, which handles scheduling tasks across available nodes in the cluster; the Hadoop Distributed File System (or MapR-FS), which stores data when the cluster runs out of free memory and which also stores historical data when Spark isn’t running; and the disaster recovery capabilities that are inherent with Hadoop.

Furthermore, Hadoop provides enhanced data security, which is critical for production workloads, especially in heavily regulated industries such as financial services and healthcare; and a distributed data platform, which enables Spark workloads to be deployed on available resources anywhere in a distributed cluster, without the need to manually allocate and track individual tasks.

When it comes to the benefits of using these two platforms together, it’s by no means a one-way street; Spark can certainly add value to Hadoop as well. For example, Spark’s machine learning module can provide capabilities that are not easily exploited in Hadoop without the use of Spark.

The original design goal of the newer framework, to allow fast in-memory processing of large data volumes, is a key contribution to the capabilities of a Hadoop cluster.

There is no doubt that newer big data frameworks such as Spark are gaining momentum. By the beginning of 2014 Spark had become one of the Apache Software Foundation’s top-level projects and today is one of its most active projects.

As of early 2015, surveys were showing that more than 500 organizations were using Spark in production, according to the foundation. These include Amazon, eBay, NASA, Yahoo!, IBM and many other entities. Many organizations are running Spark on clusters of thousands of nodes, the foundation says, and the largest known cluster has some 8,000 nodes. In terms of data size, Spark has been shown to work well up to petabytes, it says.

But as pointed out earlier, none of this means the end of Hadoop, and industry research bears this out. According to a June 2015 report by market research firm MarketAnalysis.com, the Hadoop market is forecast to grow at a compound annual growth rate (CAGR) of 58%, surpassing $1 billion by 2020.

Hadoop has become “an integral part of almost any commercially available big data solution and de-facto industry standard for business intelligence (BI),” the report notes. More and more organizations are gravitating toward Hadoop and the functionality that it offers, the report says.

Among the interesting trends that have emerged in the Hadoop market in recent years, it says, are the shift from batch processing to online processing; the emergence of MapReduce alternatives such as Spark, Storm and DataTorrent; in-house Hadoop development and deployment; the growth of the Internet of Things (IoT) and all the data it will bring; and the emergence of niche companies focused on enhancing Hadoop features and functionality.

Despite some setbacks, “there are indications that Hadoop is here to stay and grow, though the rapid growth period is still a few years ahead,” the study says.

IT and business executives would be wise to consider that the two big data frameworks, Hadoop and Spark, can work hand in hand to give organizations even greater value from their big data endeavors.

Explore more of Spark’s benefits with the free interactive ebook: Getting Started with Spark: From Inception to Production, by James A. Scott.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

microsoft 365 data migration
Why Data-Driven Businesses Consider Microsoft 365 Migration
Big Data Exclusive
real time data activation
How to Choose a CDP for Real-Time Data Activation
Big Data Exclusive
street address database
Why Data-Driven Companies Rely on Accurate Street Address Databases
Big Data Exclusive
predictive analytics risk management
How Predictive Analytics Is Redefining Risk Management Across Industries
Analytics Exclusive Predictive Analytics

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

Image
AnalyticsBig DataBusiness IntelligenceData MiningData QualityData VisualizationData WarehousingHadoopITMapReduceOpen SourceSocial DataSoftwareSQLWorkforce Data

Can Big Data and Hadoop Feed the World?

5 Min Read

R and Hadoop: Step-by-step Tutorials

2 Min Read
hadoop
Big DataHadoopMapReduceOpen Source

Hadoop + Ubuntu: The Big Fat Wedding

3 Min Read
Image
AnalyticsBig DataCloud ComputingHadoop

Oracle Goes All In On Cloud

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?