Behind the scenes of REvolution’s 64-bit Windows port of R

April 13, 2009
45 Views

I’m very happy to report that after 6 months of development by a team of 8 developers (at its peak) and two months of beta testing, we’re ready to release REvolution R Enterprise for the 64-bit Windows platform.

Although much of the work of porting R to a 64-bit codebase was done several years ago by the R Core Team for the Linux and MacOS platforms, working binaries for the 64-bit Windows platform have proven problematic until now. The main problem has been finding a reliable toolchain to compile R on 64-bit Windows. R is routinely compiled for 32-bit Windows systems with the free mingcw compiler (a version of gcc for Windows), but it has proven unreliable for 64-bit builds. 

The REvolution development team applied their expertise in the R source code and Intel’s C++ and Fortran compilers to build R for 64-bit Windows. R is a large program, and building it on a compiler other than gcc required a several changes to the core R code (which have of course been released back to the R Project as GPL sources). On the other hand, using the Intel compilers allowed us to take advantage of the architecture-native optimizations Intel’s own compiler provides, boosting the performance of R. To f

I’m very happy to report that after 6 months of development by a team of 8 developers (at its peak) and two months of beta testing, we’re ready to release REvolution R Enterprise for the 64-bit Windows platform.

Although much of the work of porting R to a 64-bit codebase was done several years ago by the R Core Team for the Linux and MacOS platforms, working binaries for the 64-bit Windows platform have proven problematic until now. The main problem has been finding a reliable toolchain to compile R on 64-bit Windows. R is routinely compiled for 32-bit Windows systems with the free mingcw compiler (a version of gcc for Windows), but it has proven unreliable for 64-bit builds. 

The REvolution development team applied their expertise in the R source code and Intel’s C++ and Fortran compilers to build R for 64-bit Windows. R is a large program, and building it on a compiler other than gcc required a several changes to the core R code (which have of course been released back to the R Project as GPL sources). On the other hand, using the Intel compilers allowed us to take advantage of the architecture-native optimizations Intel’s own compiler provides, boosting the performance of R. To further improve performance, we have integrated the Intel MKL numerical libraries: in addition to providing key optimizations of core mathematical routines, the library enables R to use multiple processors in parallel, further speeding up some mathematical functions on multicore workstations.

Another major challenge in porting R to 64-bit Windows has been in porting the packages. As we’ve been working on this project, we heard from many customers who were interested in using REvolution R on 64-bit Windows … provided the packages they used were also available. Of course, for just about every customer this was a different set of packages. To address this, we set about porting every R package to 64-bit Windows and making binary versions available. R has over 2000 packages (counting those from the BioConductor project), so this was a huge undertaking. 

Packages that contained only R code run without modification, but those that contained compiled code were time-consuming to port. Some packages are integrations with other tools which themselves needed to be ported to 64-bit Windows (as was the case with graphviz, a visualization tool that many of the BioConductor packages depend on). Porting code is always a good way to expose hard-to-find bugs, and we’ve found and fixed quite a few along the way (and communicated those changes back to the package authors). The final tally of 64-bit Windows packages that will be available on release isn’t yet complete, but already stands at over 1500. (Unfortunately, it will not be possible for us to port all packages: some have license terms that disallow it, and others integrate with third-party tools that can’t be ported.)

It’s been a long and tricky project, but we’re very excited to be able to release this port. I want to congratulate our development team on a job well done, and give thanks to the many beta testers that gave such detailed feedback. Uwe Ligges, a frequent R contributor and member of the beta test team, had this to say about the project:

“REvolution has worked hard to get R ported to x64 Windows andenables high performance useRs to work on very memory consumingapplications even under Windows. During the time I had the opportunityto test the very well working beta version on the winbuilder (64-bit)machine that is used to build CRAN binary packages for 32-bit R,REvolution developers were extremely responsive and helpful. Packageupdates that were required in order to work with REvolution’s R versionhad been released at REvolution’s binary package repository within aday or two. Congratulations to REvolution for having such a wellworking team of developers!”

On behalf of all of us here at REvolution, I also want to acknowledge and thank the R Core team who created R in the first place (and therefore most of the code in REvolution R). In particular I’d like to thank Brian Ripley and those others who took the lead in porting R to 64-bit systems at a time when 64-bit applications were still rare, giving the R community early access to large-scale data analysis. (Thanks also to Brian Ripley for his detailed comments during the beta test.) I also want to give thanks to all the package developers who have enriched the R ecosystem, and especially those that have helped us during the package porting project.

Based on our experience with R users and the downloads from our CRAN mirror, Windows is the largest installed platform for R, and a 64-bit version of REvolution R Enterprise will help R users process much larger data sets than before. With some tweaking, 32-bit Windows machines can access up to 3Gb of RAM, which in practice limits R on 32-bit Windows to datasets smaller than a gigabyte. On the 64-bit Windows platform those limitations disappear: REvolution R Enterprise there can manage much larger datasets using a greatly expanded virtual-memory address space, and the fact that you can install and use much more than 4Gb of RAM can make those analyses run even faster. (Look for some specific examples in a subsequent post.)

REvolution R Enterprise 2.0 will be available to subscribers tomorrow, April 14. This release will also include a major update to ParallelR (more about that later). For details about subscriptions, please contact our Sales team. Academic pricing is also available.

You may be interested

How SAP Hana is Driving Big Data Startups
Big Data
298 shares3,139 views
Big Data
298 shares3,139 views

How SAP Hana is Driving Big Data Startups

Ryan Kh - July 20, 2017

The first version of SAP Hana was released in 2010, before Hadoop and other big data extraction tools were introduced.…

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion
Data Management
120 views
Data Management
120 views

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion

Manish Bhickta - July 20, 2017

Physical Data destruction techniques are efficient enough to destroy data, but they can never be considered eco-friendly. On the other…

10 Simple Rules for Creating a Good Data Management Plan
Data Management
69 shares689 views
Data Management
69 shares689 views

10 Simple Rules for Creating a Good Data Management Plan

GloriaKopp - July 20, 2017

Part of business planning is arranging how data will be used in the development of a project. This is why…