By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics in sports industry
    Here’s How Data Analytics In Sports Is Changing The Game
    6 Min Read
    data analytics on nursing career
    Advances in Data Analytics Are Rapidly Transforming Nursing
    8 Min Read
    data analytics reveals the benefits of MBA
    Data Analytics Technology Proves Benefits of an MBA
    9 Min Read
    data-driven image seo
    Data Analytics Helps Marketers Substantially Boost Image SEO
    8 Min Read
    construction analytics
    5 Benefits of Analytics to Manage Commercial Construction
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: What is R?
Share
Notification Show More
Latest News
data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security
ai in software development
3 AI-Based Strategies to Develop Software in Uncertain Times
Software
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > What is R?
Data MiningPredictive Analytics

What is R?

DavidMSmith
Last updated: 2009/03/26 at 3:01 PM
DavidMSmith
8 Min Read
SHARE
What is R? It seems like a simple question, but I fear this is going to be a long post.

My main motivation is to clear up a bit of confusion around the distinction between R itself and user-contributed packages for R. It was prompted by a recent discussion about R on the MedStats mailing list, which included this comment:

After all the positive comments, I would like to raise some concern about some of the non-standard R packages. I have twice experienced a serious error in R packages (not a bug, an error in the algorithm). The authors of the first one did not reply; the authors of the second one said they know about it but do not have the time to fix it. I wonder how many — especially of the PhD/postdoc written packages, which I am sure work for their project — are really working correctly in all situations? Not all of them work on their packages as hard and great work as e.g. D. Bates with his lme4 (GLMM) package, and he and users still discover bugs and flaws in it. I do not want to criticize R; I am using it and I believe that the core packages are as valid as from commercial software (or better). But as I said, I have  doubts …

More Read

Packages for By-Group Processing in R

What is R? It seems like a simple question, but I fear this is going to be a long post.

My main motivation is to clear up a bit of confusion around the distinction between R itself and user-contributed packages for R. It was prompted by a recent discussion about R on the MedStats mailing list, which included this comment:

After all the positive comments, I would like to raise some concern about some of the non-standard R packages. I have twice experienced a serious error in R packages (not a bug, an error in the algorithm). The authors of the first one did not reply; the authors of the second one said they know about it but do not have the time to fix it. I wonder how many — especially of the PhD/postdoc written packages, which I am sure work for their project — are really working correctly in all situations? Not all of them work on their packages as hard and great work as e.g. D. Bates with his lme4 (GLMM) package, and he and users still discover bugs and flaws in it. I do not want to criticize R; I am using it and I believe that the core packages are as valid as from commercial software (or better). But as I said, I have  doubts about some hardly used ones.

It’s a fair point, but needs some clarification for readers not familiar with R. The distinction is one between “official R” and user-contributed code (which is what the commenter above is discussing).

By “official R” I mean the R project, under the control of the R Core Group. This is what you get when your download R from the CRAN website, and what’s included in REvolution R distribution. This includes both the R interpreter (the code that implements the language at the heart of R), and the various statistical functions included in the official R distribution. These components and functions are all managed under a strict software development lifecycle, and have the highest reputation for accuracy and reliability. This is what makes R suitable for all statistical analysis applications where you need the utmost confidence in the result, such as the analysis of clinical trial data. This is R. 

Now, R isn’t just a closed statistical analysis environment. It’s also designed to be a platform for other individuals to create their own methods and applications. Research institutions, academics and, yes, students, use R to implement brand-new statistical methods as part of research projects (or, sometimes, just for fun). They collect these new functions into collections called “packages” and upload them to section of CRAN dedicated to user contributions. (This is distinct from the area in CRAN where the official R distribution is found.) Some of these user-contributed packages are major bodies of work in their own right, regularly maintained and tested by their respective authors. Some are student projects, long since abandoned. Just as when using a SAS macro downloaded from a website, or installing a third-party Excel add-in, you’ll need to rely on the reputation of the author (or the recommendation of trusted peers) when deciding whether to use such third-party code.

If you’re in the habit of downloading packages from CRAN, how do you tell if a function you’re using is an official R function, and not a user-contributed one? One easy way is to use the function find, which will tell you which package the function comes from.  For example, let’s check the function nls (nonlinear least squares):

> find(“nls”)
[1] “package:stats”

This tells me that the nls function comes from the stats package. The official R distribution includes a number of standard packages. (These packages are divided into two groups — the “base” and “recommended” packages — but the distinction isn’t important here as they all fall under the same software development lifecycle and are all part of “official R” as defined above.) If the comes from any of the following packages, it’s considered official:

Official R packages (Base and Recommended)

base, boot, class, cluster, codetools, datasets, foreign, graphics, grDevices, grid, KernSmooth, lattice, MASS, methods, mgcv, nlme, nnet, rpart, spatial, splines, stats, stats4, survival, tcltk, tools, utils

This list has grown as R has matured, but the list above is applicable to R version 2.7.2 and above, and REvolution R version 1.2.3 and above. 

So, to sum up: R, drawing on the expertise and control of the R Core Group, has an excellent reputation for accuracy and reliability, on par with or even exceeding that of commercial software packages like SAS or SPSS. It’s suitable for any statistical analysis where you must rely on the results. All of this applies to the R distribution on CRAN, and to the REvolution R distribution, both of which comprise the official packages listed above. When it comes to user-contributed packages you download and install yourself, you’re no longer using code under the control of the R Core Group, in which case — as with all third-party code — you must rely on the reputation of the author of that package.

That’s a long answer to a seemingly simple question. But I hope it clears things up.

TAGGED: revolution r
DavidMSmith March 26, 2009
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

Packages for By-Group Processing in R

2 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots
ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?