R Integrated Throughout the Enterprise Analytics Stack
The past couple of years have seen a dramatic growth in the use of the R language in the enterprise. R has always been pervasive in academia for research and teaching in statistics and data science, and as new graduates trained in R have migrated to the workplace the demand for R in corporations has become more and more intense.
Database vendor Oracle estimates that "R has attracted over two million users since its introduction". James Kobielus, noted Forrester analyst and predictive analytics expert, recently said in PCWorld that "R has become a real ubiquitous force in advanced analytics. It's everywhere. Enterprise adoption of it has been growing steadily. When we ask our customers what they're using for statistical modeling they'll say SAS or [IBM's] SPSS, but they increasingly say R in the same breath."
With rapid growth of the use of R in the enterprise comes a corresponding increase in demand for enterprise support for R from its users, and demand for integration of R into corporate systems from IT (both areas in which Revolution Analytics provides software and expertise). Most large organizations have a sophisticated infrastructure devoted to data analysis, with an "analytics stack" of software to provide data warehousing and query, predictive analytics, reporting, presentation, and Business Intelligence (BI). As a result software vendors at every layer of this stack have added functionality to integrate R, accommodating demand from both users and IT, and to serve the needs of data-driven business decision makers. Let's take a look at some of the applications within the analytics stack which now provide integration with R.
The Data Layer
The data layer is where the lifeblood of the analysis — the data — is stored and prepared. Especially for high-performance and big-data applications, analytics based in R can benefit from the infrastructure that the data layer provides. The IBM blog has a great post with an in-depth discussion about integrating R with the data layer.
IBM Netezza, the high-performance data-warehousing appliance, is integrated with R (in partnership with Revolution Analytics). R users can use Revolution R Enterprise to run massively-parallel computations in R within the IBM Netezza appliance and implement high-performance, big-data analytics (with high-frequency financial data, for example). [Update: a free webinar on February 29 will describe the integration between IBM Netezza and Revolution R Enterprise in detail.]
Oracle announced last year a forthcoming connection between R and Oracle, which was made available in February 2012. Oracle R Enterprise is aimed at statisticians who are "don't know SQL" and are "not familiar with DBA tasks". It is available as part of the Oracle Advanced Analytics option (priced at around $23,000 per core), and provides a "transparency layer" for with functions to connect to Oracle and use R functionality in the Oracle database. Oracle also maintains the open-source ROracle package which provides similar functionality for open-source R.
Cloudera's Distribution Including Apache Hadoop provides support for R in partnership with Revolution Analytics. This connection makes it possible to manipulate Hadoop data stores in R directly from HDFS and HBASE, and give R programmers the ability to write MapReduce jobs in R using Hadoop Streaming.
IBM BigInsights, the Hadoop platform from IBM, is also integrated with R and Revolution R Enterprise. BigInsight queries can make use of the Map-Reduce construct while running R computations in parallel.
Teradata's Enterprise Data Warehousing platform provides in-database analytics using R via the free teradataR package. This package allows R users to connect to Teradata, create data frames linked to Teradata and to call in-database analytic functions. The Teradata Aster MapReduce Platform also provides integration with R.
Sybase RAP, the edition of the Sybase database for financial data, provides integration with R. Providing the R language alongside Sybase RAP allows for faster algorithm development and extensive backward testing on historical data. Sybase also regularly highlights R integration in its financial newsletters and webinars.
The Analytics Layer
The Analytics layer is where the magic happens: statistical modeling, predictive analytics and custom data visualization. Fed with (usually) structured data sourced from the Data Layer, R is widely used here to categorize, predict, and generally provide insight into corporate data stores. In many organizations, older data analysis tools remain in use, and so interfaces to R have been added provide support for analysts and data scientists who prefer to use R and to fill in the gaps of these legacy tools with modern, high-performance analytics.
Revolution Analytics is the leading commercial organization focused on software and support for R. Its Revolution R Enterprise software extends open-source R with productivity interfaces, high-performance statistical computing, big-data analytics, and enterprise integration of R.
SAS has been a statistical analysis workhorse since the early 70's. Now, with so many graduates in statistics trained in R instead of SAS, SAS has introduced the ability to call R from SAS/IML. (It's also possible to call R directly from base SAS thanks to a free package developed at Roche Pharmaceuticals.) SAS JMP, the point-and-click data analysis package, now also provides support for R.
IBM SPSS Statistics, the popular desktop data analysis software known simply as SPSS before being acquired by IBM in 2010, provides integration to R via the Statistics Programmability Extension module.
RStudio, an open-source software company, provides an integrated development environment for developing code in the R language.
Matlab, a numerical computing language used by engineers, also offers the ability to call R from Matlab on Windows.
The Presentation Layer
Data analysis makes the most impact in the enterprise when it can be readily acted upon by decision makers: often, business executives not steeped in the arcana of data warehousing or statistical analysis. As a result, many reporting and business intelligence tools now make it possible to make it possible to incorporate the resuts of analyses generated in R in the presentation layer, in a format tuned to the needs of a business audiecne.
You might not think of Microsoft Excel as more of a spreadsheet than a presentation tool, but it is very widely used on the desktop as a "container" for static and interactive reports based on statistical analysis. While Excel does not have out-of-the-box integration with R, is is possible to integrate R-based computation into Excel spreadsheets via RExcel and Revolution Analytics' RevoDeployR web services API.
R: Integrated throughout the analytics stack
As you can see, for organizations who need to create advanced analytics applications, R is integrated throughout the analytics stack: for data access, for presentation of results, and of course for the statistical analysis process itself. This degree of integration by so many companies is indicative of the level of demand for R throughout the enterprise. As the leading provider of commercial software and support for R, Revolution Analytics supports R users througout the organization, helps IT departments integrate Revolution R Enterprise throughout the analytics stack, for high-performance and big data applications based on the R language.
Other Posts by David Smith
The moderated business community for business intelligence, predictive analytics, and data professionals.