By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    predictive analytics in dropshipping
    Predictive Analytics Helps New Dropshipping Businesses Thrive
    12 Min Read
    data-driven approach in healthcare
    The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas
    6 Min Read
    analytics for tax compliance
    Analytics Changes the Calculus of Business Tax Compliance
    8 Min Read
    big data analytics in gaming
    The Role of Big Data Analytics in Gaming
    10 Min Read
    analyst,women,looking,at,kpi,data,on,computer,screen
    Promising Benefits of Predictive Analytics in Asset Management
    11 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Apache Spark Use Cases
Share
Notification Show More
Latest News
ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence
ai in omnichannel marketing
AI is Driving Huge Changes in Omnichannel Marketing
Artificial Intelligence
ai for small business tax planning
Maximize Tax Deductions as a Business Owner with AI
Artificial Intelligence
ai in marketing with 3D rendering
Marketers Use AI to Take Advantage of 3D Rendering
Artificial Intelligence
How Big Data Is Transforming the Maritime Industry
How Big Data Is Transforming the Maritime Industry
Big Data
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Apache Spark Use Cases
Data Mining

Apache Spark Use Cases

kingmesal
Last updated: 2015/09/28 at 8:53 AM
kingmesal
6 Min Read
Image
SHARE

ImageSure, Apache Spark looks cool, but does it live up to the hype? Is there anything you can actually do with it? Actually, there are some pretty cool use cases going on right now.

Exploratory Analytics

ImageSure, Apache Spark looks cool, but does it live up to the hype? Is there anything you can actually do with it? Actually, there are some pretty cool use cases going on right now.

Exploratory Analytics

More Read

data mining

Data Mining Technology Helps Online Brands Optimize Their Branding

Can Data Mining Aid with Off-Page SEO Strategies?
3 Data Mining Tips for Companies Trying to Understand their Customers
5 Data Mining Tips to Leverage the Benefits of Surveys
Perform Data Mining With Web Scrapers to Track Prices

One of the best features of modern programming languages is that many of them offer interactive shells, from Bash to Python to Scala. Instead of a time-consuming write/compile/test/debug cycle, you can try out your ideas in the shell immediately.

Spark takes this idea and applies it to Big Data. You can explore your data interactively using either Python or Scala without having to wait on batch queries. Spark lets you use any kind of data, whether it’s structured, semi-structured, or unstructured. You can also use any kind of programming model you want: imperative, functional, or object-oriented.

The key to this is Spark’s use of Resilient Distributed Datasets, or RDDs. RDDs are stored in memory, which is much faster than using a disk. It can additionally use the disks if there is more data than can fit in memory. If you think this would be a recipe for slow performance with Big Data, think again. Spark uses lazy evaluation, which only performs computation when you need a result—such as printing a value. You can set up complex queries and then run them later.

RDDs are immutable, which means that there’s no risk from exploring datasets. The lineage feature lets you recover from errors with a complete history of the RDDs. This makes exploring large datasets safe.You can also connect your other databases using SQL drivers.

Machine Learning

Spark offers some powerful machine learning tools. As with exploratory analytics, you can use the interactive REPL (common acronym for an interactive shell meaning run-evaluate-print-loop) to develop algorithms in real time. Spark also caches frequently accessed datasets for maximum efficiency. You can develop your own algorithms or use some efficient algorithms from MLlib.

Machine learning is becoming important for threat detection. A client of MapR Technologies is a credit card company who uses Spark to detect potential credit card fraud. Another client uses it to detect possible network threats.

Real-Time Dashboards

Big Data is no good if you have no way to see it. Apache Spark offers the ability to power real-time dashboards. The goal of Big Data is to sift through large amounts of data to find insights that people in your organization can act on.

While a programmer might be able to use the REPL described earlier to explore data, most people are not going to be willing to learn SQL, Scala, Python, or Spark in order to look for trends.

Spark Streaming can be leveraged to perform low-latency, window-based aggregations of your data. Spark can combine both streaming and offline databases for an optimal view of a company’s data, enabling dashboards which let users drill down to get an easy, graphical, intuitive view of their data. The ability to connect to other databases using SQL drivers gives a holistic view of an organization.

ETL

With the ability to process massive amounts of data quickly, Apache Spark is ideal for data warehouses. While your databases may be structured, in the real world, data can be anything but. You might be looking for a way to clean and transform data coming from sources inside and outside your organization. Apache Spark makes the task much less daunting.

Spark offers a variety of ETL (Extract, Transform, and Load) tools. Sparks includes optimized scheduling for the most efficient I/O on the large datasets that data warehousing employs. The in-memory nature of Spark lets you perform aggregation, shuffles, and other operations on your data.

Spark lets you use tools you’re already familiar with. You can also use SQL to perform ETL, flattening the learning curve for you and administrators in getting your data into Spark. You can also port PIG scripts to Spark, as well as run HIVE queries.

Conclusion

With fast in-memory processing, Apache Spark offers up a whole new way to explore and act on your data. The MapR distribution of Spark gives you everything you need to make the best use of your data right out of the box.

For a more in-depth introduction to Spark, read Getting Started with Spark: From Inception to Production, a free interactive eBook by James A. Scott.

kingmesal September 28, 2015
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence
ai in omnichannel marketing
AI is Driving Huge Changes in Omnichannel Marketing
Artificial Intelligence
ai for small business tax planning
Maximize Tax Deductions as a Business Owner with AI
Artificial Intelligence
ai in marketing with 3D rendering
Marketers Use AI to Take Advantage of 3D Rendering
Artificial Intelligence

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

data mining
Data Mining

Data Mining Technology Helps Online Brands Optimize Their Branding

7 Min Read
data mining helps with offsite SEO
Data Mining

Can Data Mining Aid with Off-Page SEO Strategies?

10 Min Read
using data mining to learn more about customers
Big Data

3 Data Mining Tips for Companies Trying to Understand their Customers

6 Min Read
surveys data
Data Mining

5 Data Mining Tips to Leverage the Benefits of Surveys

11 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?