By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data-driven image seo
    Data Analytics Helps Marketers Substantially Boost Image SEO
    8 Min Read
    construction analytics
    5 Benefits of Analytics to Manage Commercial Construction
    5 Min Read
    benefits of data analytics for financial industry
    Fascinating Changes Data Analytics Brings to Finance
    7 Min Read
    analyzing big data for its quality and value
    Use this Strategic Approach to Maximize Your Data’s Value
    6 Min Read
    data-driven seo for product pages
    6 Tips for Using Data Analytics for Product Page SEO
    11 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Performance benefits of linking R to multithreaded math libraries
Share
Notification Show More
Latest News
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security
ai in software development
3 AI-Based Strategies to Develop Software in Uncertain Times
Software
ai in ppc advertising
5 Proven Tips for Utilizing AI with PPC Advertising in 2023
Artificial Intelligence
data-driven image seo
Data Analytics Helps Marketers Substantially Boost Image SEO
Analytics
ai in web design
5 Ways AI Technology Has Disrupted Website Development
Artificial Intelligence
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Business Intelligence > Performance benefits of linking R to multithreaded math libraries
Business Intelligence

Performance benefits of linking R to multithreaded math libraries

DavidMSmith
Last updated: 2010/06/11 at 6:48 PM
DavidMSmith
8 Min Read
SHARE
- Advertisement -

R wasn’t originally designed as a multithreaded application — multiprocessor systems were still rare when the R Project was first conceived in the mid-90’s — and so, by default, R will only use one processor of your dual-core laptop or quad-core desktop machine when doing calculations. For calculations that take a long time, like big simulations or modeling of large data sets, it would be nice to put those other processors to use to speed up the computations. There are several parallel processing libraries for R available that allow you to explicitly run loops in R simultaneously (ideally, each on a different processor), but using them does require you to rewrite your code accordingly. 

But there is a way to make use of all your processing power for many computations in R, without changing a line of code. That’s because R is a statistical computing system, and at the heart of many of the algorithms you use on a daily basis — data restructuring, regressions, classifications, even some graphics functions — is linear algebra. The data are transformed into vector and matrix objects, and the internals of R have been cleverly designed to link to a standard “BLAS” API to …

R wasn’t originally designed as a multithreaded application — multiprocessor systems were still rare when the R Project was first conceived in the mid-90’s — and so, by default, R will only use one processor of your dual-core laptop or quad-core desktop machine when doing calculations. For calculations that take a long time, like big simulations or modeling of large data sets, it would be nice to put those other processors to use to speed up the computations. There are several parallel processing libraries for R available that allow you to explicitly run loops in R simultaneously (ideally, each on a different processor), but using them does require you to rewrite your code accordingly. 

More Read

Planview Improves Long-Range Planning Potential

What Every CEO Needs to Know About IT
The ‘Big Data’ Buzz – Revolution or Evolution?
A New Kind Of Data Warehousing Will Emerge in 2011 According To Gartner
Who Are the Animals of Analytics-Based Performance Management?

But there is a way to make use of all your processing power for many computations in R, without changing a line of code. That’s because R is a statistical computing system, and at the heart of many of the algorithms you use on a daily basis — data restructuring, regressions, classifications, even some graphics functions — is linear algebra. The data are transformed into vector and matrix objects, and the internals of R have been cleverly designed to link to a standard “BLAS” API to perform calculations on vectors and matrices. The binaries provided by the R Core Group on CRAN (with one exception, see below) are linked to an “internal BLAS which is well-tested and will be adequate for most uses of R”, but is not multi-threaded and so only uses one core. But the beauty of linking to the BLAS API is that you can re-compile R to link to a different, multi-threaded BLAS library and, voilà, suddenly many computations are using all cores and therefore run much faster.

The MacOS port of R on CRAN is linked to ATLAS, a “tuned” BLAS that uses multiple cores for computations. As a result, R on a multi-core Mac (as all new Macs are these days) really zooms. But the Windows binaries on CRAN are not linked to an optimized BLAS. It’s possible to compile and link R yourself, but it can be tricky.

That’s what we do at Revolution for our Windows and Linux distributions of Revolution R. When we compile R, we link it to the Intel Math Kernel Libraries, which includes a high-performance BLAS implementation tuned to multi-core Intel chips. “Tuning” here means using efficient algorithms, optimized assembly code that exploits features of the chipset, and multi-threaded algorithms that use all cores simultaneously. As a result, you get some serious speed boosts for many operations in R, especially on a multi-core system. Here are some examples:

CalculationSizeCommandR 2.9.2Revolution R
(1 core)
Revolution R
(4 cores)

Matrix Multiply
A*A’ 
10000×5000B <- crossprod(A)243 sec22 sec5.9 sec

Cholesky Factorization5000×5000C <- chol(B)23 sec3.8 sec1.1 sec

Singular Value Decomposition5000×5000S <- svd (A,nu=0,nv=0)62 sec13 sec4.9 sec

Principal Components Analysis10000×5000P <- prcomp(A)237 sec41 sec15.6 sec

As you can see, using the Intel MKL libraries on a 4-core machine gives some dramatic speedups (about a quarter of the 1-core time, as you might expect). Perhaps more surprisingly, using the Intel MKL libraries on a 1-core machine is also faster than using R’s standard BLAS library: this is a result of the optimized algorithms, not additional computing power. You even get improvements on non-Intel chipsets (like AMD).

[A side note: These calculations were actually all run on an 8-core machine, specifically, an Intel Xeon 8-core CPU with 18 GB system RAM running Windows Server 2008 operating system. The complete benchmark code is available on this page. The results for Revolution R 1-core and 4-core were calculated by restricting the Intel MKL library to use 1 thread and 4 threads, using the Revolution-R-specific commands setMKLthreads(1) and setMKLthreads(4) respectively. This has the effect of using only the power of the specific number of cores, even when more cores are available. Note: if you’re using Revolution R and are doing explicit parallel programming with doSMP, it’s a good idea to call setMKLthreads(1) first. Otherwise, your parallel loops and the multi-threaded linear algebra computations will compete for the same processor and actually degrade performance.]

These results are dramatic, but multi-threaded BLAS libraries aren’t a panacea. Not all R commands ultimately link to BLAS code, even ones you might expect. (For example, lm for regression uses a non-BLAS QR decomposition by default.) And if your R code ultimately does not involve linear algebra, you can’t expect any improvement at all. (For example, the “Program Control” R benchmarks by Simon Urbanek show only marginal performance gains in Revolution R.) This is when explicit parallel programming is the route to improved performance. We’re also working on dedicated statistical routines for Revolution R Enterprise that are explicitly multi-threaded for single-machines and also distributable to multiple machines in a cluster or in the cloud, but that’s a topic for another post.

Revolution Analytics: Performance Benchmarks

Link to original post

TAGGED: performance management, r project
DavidMSmith June 11, 2010
Share this Article
Facebook Twitter Pinterest LinkedIn
Share
- Advertisement -

Follow us on Facebook

Latest News

anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security
ai in software development
3 AI-Based Strategies to Develop Software in Uncertain Times
Software
ai in ppc advertising
5 Proven Tips for Utilizing AI with PPC Advertising in 2023
Artificial Intelligence
data-driven image seo
Data Analytics Helps Marketers Substantially Boost Image SEO
Analytics

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

Planview Improves Long-Range Planning Potential

14 Min Read

What Every CEO Needs to Know About IT

9 Min Read

The ‘Big Data’ Buzz – Revolution or Evolution?

6 Min Read

A New Kind Of Data Warehousing Will Emerge in 2011 According To Gartner

0 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?