Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: REvolution R Enterprise 2.0 released
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > REvolution R Enterprise 2.0 released
Data MiningPredictive Analytics

REvolution R Enterprise 2.0 released

DavidMSmith
DavidMSmith
11 Min Read
SHARE

As I previewed yesterday, REvolution R Enterprise 2.0 is now available to subscribers. In yesterday’s post, I focused mainly on the process of creating the release; today, I’d like to talk about some of its new features.

64-bit Windows support

REvolution R Enterprise 2.0 is the only version of R available for 64-bit Windows systems. This means that it is now possible to analyze much larger data sets on Windows systems than ever before. The reason for this is that, with a few exceptions, all of the computational routines in R are in-memory. This means that the entire data-set and any temporary copies and working variables required by the routine must be able to fit into the operating system’s memory at once. As a rough rule of thumb, most statistical routines (like regression or tree models) will require at least three temporary copies of the data. So on a 32-bit Windows system where the maximum memory available is around 3 gigabytes that means you can analyze a data set of 750 megabytes, tops. But on a 64-bit system, as long as you have enough disk space available, you can analyze much larger data sets. In fact, your limitation will likely be the amount of time you ha…

More Read

Data Mining Research Interview: Stuart Shulman
A Question of Scope
Predictive Analytic Strategies to Out-Predict the Competition
Predictive Analytics Influences App Development For Emerging Markets
Top 5 Reasons R is Good for you

As I previewed yesterday, REvolution R Enterprise 2.0 is now available to subscribers. In yesterday’s post, I focused mainly on the process of creating the release; today, I’d like to talk about some of its new features.

64-bit Windows support

REvolution R Enterprise 2.0 is the only version of R available for 64-bit Windows systems. This means that it is now possible to analyze much larger data sets on Windows systems than ever before. The reason for this is that, with a few exceptions, all of the computational routines in R are in-memory. This means that the entire data-set and any temporary copies and working variables required by the routine must be able to fit into the operating system’s memory at once. As a rough rule of thumb, most statistical routines (like regression or tree models) will require at least three temporary copies of the data. So on a 32-bit Windows system where the maximum memory available is around 3 gigabytes that means you can analyze a data set of 750 megabytes, tops. But on a 64-bit system, as long as you have enough disk space available, you can analyze much larger data sets. In fact, your limitation will likely be the amount of time you have to wait rather than the storage you have available. Even better, you can install and use more than 4 gigabytes of RAM (memory chips) on 64-bit Windows systems, and for large data set analysis the more RAM you have installed, the faster it will run. You can expect the best performance when the installed RAM is at least 3-4 times the size of the data. (You don’t need that much, but the analysis will run slower if you have less.)  
This opens R to a whole new world of possibilities for analyzing data on 64-bit Windows systems.  For example, you can now:
  • Estimate correlation matrices (and calculate Value at Risk) for much larger financial portfolios
  • Use the Bioconductor suite to analyze pharmaceutical and biochemical data from much larger microarrays   
  • Build predictive models about purchasing behavior on larger databases of customer data, without the need for sampling
Brian Ripley, Professor of Applied Statistics at the University of Oxford and member of the R Core Development Team, reported using REvolution R Enterprise for genetic analysis during the beta test. He said:

“REvolution are to be congratulated on a technical tour de force…This will bring to Windows users the freedom to use R on large problems that users of Unix-like platforms have enjoyed for several years.  We did some testing on a 32GB Windows box on behalf of a computational genetics project, and the beta was 100% reliable and comparable in performance to the Rcore 32-bit distribution but able to tackle much larger problems.”

Basically, if you’ve tried to use R to analyze a large dataset on Windows before and gotten an error like “cannot allocate vector of size 858213 Kb”, switching to a 64-bit version of Windows with REvolution R Enterprise 2.0 is likely to help.

ParallelR upgraded

REvolution R Enterprise 2.0 comes with ParallelR, a suite of packages from REvolution Computing that simplify parallel programming in R. If you have a multiprocessor workstation (and most higher-end laptops and desktops sold today have at least 2 processors or cores), then parallel programming is a way of instructing R to use all processors simultaneously to reduce computation time. REvolution R automatically uses multiple processors for some key mathematical routines like matrix multiplication and decomposition, but for general R code only one processor will be used at a time unless you use the features of ParallelR.

There are a few other systems available for parallel programming in R, but after talking to users who had attempted to use them, we found that most attempts by casual users had been abandoned in frustration. This is because these systems were designed primarily for use on clusters (collections of workstations) for distributed computing. This in turn requires complex procedures for setting up the environment: designating the server and clients, nominating processors on each, bypassing security measures so that each instance of R can talk to each other, and so on. We also heard that writing parallel programs in these systems was complicated: the programmer had to deal with a lot of unfamiliar concepts like clients, servers, shared variables, message-passing and so on. When correctly configured these systems can offer excellent performance, but unless you have the computer-science background and training to rewrite your R programs using these new paradigms the performance gain is, well, zero.

ParallelR is designed so that the casual R programmer can easily convert “embarrassingly parallel” R programs to run faster on multiprocessor workstations. Embarrassingly parallel problems are those with sequences of steps that can be arbitrarily reordered because no step depends on the results of any other step. Common examples in the Statistics world are simulations, bagging and boosting procedures (fitting random forest models, for example), predictions, and fitting the same model to a sequence of dependent variables (a series of regions or segments, for example). 

The key innovation is a new function called foreach, which you can use to replace the traditional for loop in R. If you had enough processors available, each iteration of the loop would run at the same time, in parallel. More realistically, a few iterations will run in parallel at any one time — one per available processor. ParallelR handles all the complexity of scheduling each iteration when a processor becomes available and collecting the results, and automatically ensures that the local variables of the loop are replicated so the values from one iteration do no trample those of another. You can see some examples of foreach in action on the REvolution website, or in this recent webcast on backtesting financial models where using a quad-core system in parallel reduced the computational time by almost 75%.

For really meaty jobs you can speed up performance even more by adding more processors with a cluster. You don’t need to have a dedicated laboratory full of high-powered workstations available: you can always harness those PCs and Macs sitting idle around the office overnight for your heaviest number-crunching problems. ParallelR makes it easy to take the code you’ve already run in parallel on your desktop and extend it to a cluster of machines running R using a feature called sleighs. And if the overnight cleaner accidentally unplugs one of those machines you won’t have wasted a night’s computing time: ParallelR has fault tolerance so your job will still complete even if some of the nodes in the cluster become unavailable.

Enterprise Support and Service

As our subscription-level version of R, REvolution R Enterprise is backed by REvolution Computing and comes with full technical support services from our teams. It also comes ready for use in validated environments, such as for the analysis of FDA-controlled clinical trials.

Looking for more?

In the coming weeks we’ll have more examples and stories about REvolution R Enterprise here, but in the meantime if you’d like more information or want to enquire about subscriptions and editions, please just contact REvolution Computing and we’ll be happy to help. 

REvolution Computing (press release): REvolution R Enterprise with Parallel Processing Now Available for 64-bit Windows

TAGGED:product launch
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

First Look – SPSS Predictive Analytic Software 13

5 Min Read

First Look – DeltaR onRules

9 Min Read

CRAN R 2.9.0 now available

2 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?