Analysis: R Growth Continues in Popularity of Data Analysis Software

March 22, 2011
61 Views

Bob Muenchen, author of R for SAS and SPSS Users and co-author of R for Stata Users, has updated his in-depth analysis of the popularity of data analysis software. Determining “popularity” for software is a tricky task, but this analysis looks at several different metrics: mailing list traffic, blogs, search volumes, job listings and other such indirect methods, as well as more traditional surveys.

Bob Muenchen, author of R for SAS and SPSS Users and co-author of R for Stata Users, has updated his in-depth analysis of the popularity of data analysis software. Determining “popularity” for software is a tricky task, but this analysis looks at several different metrics: mailing list traffic, blogs, search volumes, job listings and other such indirect methods, as well as more traditional surveys. A few nuggets that leapt out at me related to R:

  • In terms of scholarly activity: citations of R continue to increasing. Commercial alternatives like SAS, SPSS, Stata have apparently been hit hard by funding restrictions in recent years.
  • The (literally) exponential growth in the number of R packages published on CRAN continues unabated.
  • Measuring popularity via on messages volumes on email lists is parrticularly tricky, especially given the fragmentation of lists into special topics, and the proliferation of other discussion forums outside of email. It was interesting to learn that the most popular discussion topic on SAS-L mailing list in 2010 was R.
  • Google Insights is an interesting way of looking at popularity via search volumes, but it’s highly dependent on the search term you track. The analysis does a great job of doing an apples-to-apples comparison while avoiding false positives by looking at trends of searches for “XX code for” and “XXX graph” (where XXX is SAS, SPSS, Stata or R). On the other hand,  a search for “analysis in XXX”/”XXX analysis” puts SPSS on top. R is growing strongly in both cases, though.

One thing I might suggest for the next update in 6 months is to include volumes of tags on question-and-answer sites like the various StackExchange sites and Quora. For example, the R tag on CrossValidated had 260 questions 3 months ago and now has 492; it would be interesting to track that growth over time and compare it to other data analysis packages.

r2stats.com: The Popularity of Data Analysis Software