By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    analyst,women,looking,at,kpi,data,on,computer,screen
    What to Know Before Recruiting an Analyst to Handle Company Data
    6 Min Read
    AI analytics
    AI-Based Analytics Are Changing the Future of Credit Cards
    6 Min Read
    data overload showing data analytics
    How Does Next-Gen SIEM Prevent Data Overload For Security Analysts?
    8 Min Read
    hire a marketing agency with a background in data analytics
    5 Reasons to Hire a Marketing Agency that Knows Data Analytics
    7 Min Read
    predictive analytics for amazon pricing
    Using Predictive Analytics to Get the Best Deals on Amazon
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Spark vs. Hadoop: Not Enemies, but Sidekicks
Share
Notification Show More
Aa
SmartData CollectiveSmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Spark vs. Hadoop: Not Enemies, but Sidekicks
Big Data

Spark vs. Hadoop: Not Enemies, but Sidekicks

kingmesal
Last updated: 2015/09/21 at 8:00 AM
kingmesal
6 Min Read
Image
SHARE

ImageWhile you might have been hearing about Apache Spark and all of the things it can do, you might be wondering whatever happened to Hadoop? After all, MapR is still one of the biggest Hadoop distribution providers.

ImageWhile you might have been hearing about Apache Spark and all of the things it can do, you might be wondering whatever happened to Hadoop? After all, MapR is still one of the biggest Hadoop distribution providers. While you might think that Apache Spark might be replacing Hadoop, that’s anything but the case. Spark represents the next step for Hadoop.

Spark Depends on Hadoop

Hadoop was revolutionary when it burst onto the scene in 2005. Google had its IPO and Facebook had just been founded a year earlier. The cloud was still something you only saw in the sky or on a Windows XP desktop.

More Read

analyst,women,looking,at,kpi,data,on,computer,screen

What to Know Before Recruiting an Analyst to Handle Company Data

Tackling Bias in AI Translation: A Data Perspective
Data Ethics: Safeguarding Privacy and Ensuring Responsible Data Practices
Banks Merge Data Mining and CRM Tools to Boost Profitability
How Residential Proxies Help Improve Data Gathering

The original Hadoop MapReduce made it possible for data centers to process large amounts of data quickly and reliably over a large number of rack-mounted commodity servers. This allowed for scaling by adding more servers rather than having to move everything to a new, bigger machine. With Web 2.0 coming onto the scene, web companies simply didn’t have the luxury of waiting to move everything over to new systems.

Hadoop is the platform of choice for big data at companies like eBay and Yahoo; Spark is also becoming the platform of choice for streaming and batch processing big data. Spark still relies on HDFS to handle the actual data storage. This means that far from going away, Hadoop will become even more important in the future.

Altiscale CTO Raymie Stata agrees:

“To position Spark in opposition to Hadoop is like saying that your new electric car is so cool that you won’t need electricity anymore. If anything, electric cars will drive demand for more electricity.”

What Hadoop Offers

Hadoop’s biggest asset is the Hadoop Distributed File System (HDFS). As the name suggests, it’s a file system for distributing large amounts of data across many servers. The reliable storage system serves as a base for distributed processing engines.

What Spark Offers

Hadoop offers a choice of compute engines under YARN, Spark being one of them. In the past, the most popular had been Hadoop MapReduce. Google wrote the white paper on this framework and used it for indexing all the websites on the internet. With millions of new pages arriving online each day, the engine was designed for processing large amounts of data—but it’s been geared toward batch jobs. No one complains that their page doesn’t show up in Google’s search results the second it’s uploaded. Well, except SEO consultants, perhaps.

Spark, on the other hand, adds in-memory batch and stream processing to the mix. It does this primarily through RDDs, or Resilient Distributed Datasets. These represent data in memory, which is a lot faster than accessing it from hard drives.

With Spark, transformations are performed on RDDs which yield another RDD. Some might think that RDDs and their transformations might hurt performance, but that’s not actually true because of the way transformations are evaluated. Spark borrows another concept from functional programming: lazy evaluation. While you can define transformations, Spark won’t actually compute them unless you perform actions that require a result, such as printing the contents of a file or counting words in a document.

Since the RDDs aren’t actually loaded until you ask for them, this gives you a lot of flexibility in processing. You can use Spark Streaming to process data in micro-batches, even plugging your data into the powerful machine learning it offers.

Dynamic programming languages like Python have opened up new ways to program, letting you develop algorithms interactively non-stop instead of the write/compile/test/debug cycle of C, not to mention chasing the inevitable memory management bugs.

Apache Spark adds the flexibility of dynamic programming to big data, letting you develop new applications and explore your data for trends quickly. Business moves fast, and if you want to stay ahead of your competitors, you’re going to have to move even faster. With Spark, you can build on the solid foundation of Hadoop and HDFS while using Spark to save valuable programmer time.

To give an analogy, Windows 10 launched recently, and while it has some new features, in many ways it’s similar to the previous versions—unless you got the “Something Happened” error when trying to install it.

Conclusion

While Spark offers lightning fast batch processing and stream processing for big data, it still has the tried-and-true HDFS for reliability, giving you advanced processing power and reliability in the same package.

For a more in-depth introduction to Spark, read the free interactive eBook: Getting Started with Spark: From Inception to Production, by James A. Scott.


kingmesal September 21, 2015
Share This Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

data breaches
How Hospital Security Breaches Devastate Local Communities
Policy and Governance
analyst,women,looking,at,kpi,data,on,computer,screen
What to Know Before Recruiting an Analyst to Handle Company Data
Analytics
data perspective
Tackling Bias in AI Translation: A Data Perspective
Big Data
Data Ethics: Safeguarding Privacy and Ensuring Responsible Data Practices
Data Ethics: Safeguarding Privacy and Ensuring Responsible Data Practices
Best Practices Big Data Data Collection Data Management Privacy

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

analyst,women,looking,at,kpi,data,on,computer,screen
Analytics

What to Know Before Recruiting an Analyst to Handle Company Data

6 Min Read
data perspective
Big Data

Tackling Bias in AI Translation: A Data Perspective

9 Min Read
Data Ethics: Safeguarding Privacy and Ensuring Responsible Data Practices
Best PracticesBig DataData CollectionData ManagementPrivacy

Data Ethics: Safeguarding Privacy and Ensuring Responsible Data Practices

7 Min Read
data mining and crm for banking
Big Data

Banks Merge Data Mining and CRM Tools to Boost Profitability

9 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?