Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: What is R?
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > What is R?
Data MiningPredictive Analytics

What is R?

DavidMSmith
DavidMSmith
8 Min Read
SHARE
What is R? It seems like a simple question, but I fear this is going to be a long post.

My main motivation is to clear up a bit of confusion around the distinction between R itself and user-contributed packages for R. It was prompted by a recent discussion about R on the MedStats mailing list, which included this comment:

After all the positive comments, I would like to raise some concern about some of the non-standard R packages. I have twice experienced a serious error in R packages (not a bug, an error in the algorithm). The authors of the first one did not reply; the authors of the second one said they know about it but do not have the time to fix it. I wonder how many — especially of the PhD/postdoc written packages, which I am sure work for their project — are really working correctly in all situations? Not all of them work on their packages as hard and great work as e.g. D. Bates with his lme4 (GLMM) package, and he and users still discover bugs and flaws in it. I do not want to criticize R; I am using it and I believe that the core packages are as valid as from commercial software (or better). But as I said, I have  doubts …

What is R? It seems like a simple question, but I fear this is going to be a long post.

More Read

predictive analytics capabilities of blockchain
The Incredible Predictive Analytics Capabilities Of Blockchain
Edith on GT : A BI solution for Advanced Data Mining
Interactive Analysis and Related Tools – Part II
Business Analytics Error: Learn from Uber’s Mistake During the Sydney Terror Attack
Predictive Policing with Big Data
My main motivation is to clear up a bit of confusion around the distinction between R itself and user-contributed packages for R. It was prompted by a recent discussion about R on the MedStats mailing list, which included this comment:

After all the positive comments, I would like to raise some concern about some of the non-standard R packages. I have twice experienced a serious error in R packages (not a bug, an error in the algorithm). The authors of the first one did not reply; the authors of the second one said they know about it but do not have the time to fix it. I wonder how many — especially of the PhD/postdoc written packages, which I am sure work for their project — are really working correctly in all situations? Not all of them work on their packages as hard and great work as e.g. D. Bates with his lme4 (GLMM) package, and he and users still discover bugs and flaws in it. I do not want to criticize R; I am using it and I believe that the core packages are as valid as from commercial software (or better). But as I said, I have  doubts about some hardly used ones.

It’s a fair point, but needs some clarification for readers not familiar with R. The distinction is one between “official R” and user-contributed code (which is what the commenter above is discussing).

By “official R” I mean the R project, under the control of the R Core Group. This is what you get when your download R from the CRAN website, and what’s included in REvolution R distribution. This includes both the R interpreter (the code that implements the language at the heart of R), and the various statistical functions included in the official R distribution. These components and functions are all managed under a strict software development lifecycle, and have the highest reputation for accuracy and reliability. This is what makes R suitable for all statistical analysis applications where you need the utmost confidence in the result, such as the analysis of clinical trial data. This is R. 

Now, R isn’t just a closed statistical analysis environment. It’s also designed to be a platform for other individuals to create their own methods and applications. Research institutions, academics and, yes, students, use R to implement brand-new statistical methods as part of research projects (or, sometimes, just for fun). They collect these new functions into collections called “packages” and upload them to section of CRAN dedicated to user contributions. (This is distinct from the area in CRAN where the official R distribution is found.) Some of these user-contributed packages are major bodies of work in their own right, regularly maintained and tested by their respective authors. Some are student projects, long since abandoned. Just as when using a SAS macro downloaded from a website, or installing a third-party Excel add-in, you’ll need to rely on the reputation of the author (or the recommendation of trusted peers) when deciding whether to use such third-party code.

If you’re in the habit of downloading packages from CRAN, how do you tell if a function you’re using is an official R function, and not a user-contributed one? One easy way is to use the function find, which will tell you which package the function comes from.  For example, let’s check the function nls (nonlinear least squares):

> find(“nls”)
[1] “package:stats”

This tells me that the nls function comes from the stats package. The official R distribution includes a number of standard packages. (These packages are divided into two groups — the “base” and “recommended” packages — but the distinction isn’t important here as they all fall under the same software development lifecycle and are all part of “official R” as defined above.) If the comes from any of the following packages, it’s considered official:

Official R packages (Base and Recommended)

base, boot, class, cluster, codetools, datasets, foreign, graphics, grDevices, grid, KernSmooth, lattice, MASS, methods, mgcv, nlme, nnet, rpart, spatial, splines, stats, stats4, survival, tcltk, tools, utils

This list has grown as R has matured, but the list above is applicable to R version 2.7.2 and above, and REvolution R version 1.2.3 and above. 

So, to sum up: R, drawing on the expertise and control of the R Core Group, has an excellent reputation for accuracy and reliability, on par with or even exceeding that of commercial software packages like SAS or SPSS. It’s suitable for any statistical analysis where you must rely on the results. All of this applies to the R distribution on CRAN, and to the REvolution R distribution, both of which comprise the official packages listed above. When it comes to user-contributed packages you download and install yourself, you’re no longer using code under the control of the R Core Group, in which case — as with all third-party code — you must rely on the reputation of the author of that package.

That’s a long answer to a seemingly simple question. But I hope it clears things up.

TAGGED:revolution r
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Packages for By-Group Processing in R

2 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?