Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: What is R?
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > What is R?
Data MiningPredictive Analytics

What is R?

DavidMSmith
DavidMSmith
8 Min Read
SHARE
What is R? It seems like a simple question, but I fear this is going to be a long post.

My main motivation is to clear up a bit of confusion around the distinction between R itself and user-contributed packages for R. It was prompted by a recent discussion about R on the MedStats mailing list, which included this comment:

After all the positive comments, I would like to raise some concern about some of the non-standard R packages. I have twice experienced a serious error in R packages (not a bug, an error in the algorithm). The authors of the first one did not reply; the authors of the second one said they know about it but do not have the time to fix it. I wonder how many — especially of the PhD/postdoc written packages, which I am sure work for their project — are really working correctly in all situations? Not all of them work on their packages as hard and great work as e.g. D. Bates with his lme4 (GLMM) package, and he and users still discover bugs and flaws in it. I do not want to criticize R; I am using it and I believe that the core packages are as valid as from commercial software (or better). But as I said, I have  doubts …

What is R? It seems like a simple question, but I fear this is going to be a long post.

More Read

PAW: Cross Industry Challenges and Solutions in Predictive Analytics
What topics would you like to see covered at a KDD conference?
“Big Data” Is Coming, “Big Data” Is Coming:
Better customer service, better results with predictive analytics
Will Data Drive Decision Improvement?
My main motivation is to clear up a bit of confusion around the distinction between R itself and user-contributed packages for R. It was prompted by a recent discussion about R on the MedStats mailing list, which included this comment:

After all the positive comments, I would like to raise some concern about some of the non-standard R packages. I have twice experienced a serious error in R packages (not a bug, an error in the algorithm). The authors of the first one did not reply; the authors of the second one said they know about it but do not have the time to fix it. I wonder how many — especially of the PhD/postdoc written packages, which I am sure work for their project — are really working correctly in all situations? Not all of them work on their packages as hard and great work as e.g. D. Bates with his lme4 (GLMM) package, and he and users still discover bugs and flaws in it. I do not want to criticize R; I am using it and I believe that the core packages are as valid as from commercial software (or better). But as I said, I have  doubts about some hardly used ones.

It’s a fair point, but needs some clarification for readers not familiar with R. The distinction is one between “official R” and user-contributed code (which is what the commenter above is discussing).

By “official R” I mean the R project, under the control of the R Core Group. This is what you get when your download R from the CRAN website, and what’s included in REvolution R distribution. This includes both the R interpreter (the code that implements the language at the heart of R), and the various statistical functions included in the official R distribution. These components and functions are all managed under a strict software development lifecycle, and have the highest reputation for accuracy and reliability. This is what makes R suitable for all statistical analysis applications where you need the utmost confidence in the result, such as the analysis of clinical trial data. This is R. 

Now, R isn’t just a closed statistical analysis environment. It’s also designed to be a platform for other individuals to create their own methods and applications. Research institutions, academics and, yes, students, use R to implement brand-new statistical methods as part of research projects (or, sometimes, just for fun). They collect these new functions into collections called “packages” and upload them to section of CRAN dedicated to user contributions. (This is distinct from the area in CRAN where the official R distribution is found.) Some of these user-contributed packages are major bodies of work in their own right, regularly maintained and tested by their respective authors. Some are student projects, long since abandoned. Just as when using a SAS macro downloaded from a website, or installing a third-party Excel add-in, you’ll need to rely on the reputation of the author (or the recommendation of trusted peers) when deciding whether to use such third-party code.

If you’re in the habit of downloading packages from CRAN, how do you tell if a function you’re using is an official R function, and not a user-contributed one? One easy way is to use the function find, which will tell you which package the function comes from.  For example, let’s check the function nls (nonlinear least squares):

> find(“nls”)
[1] “package:stats”

This tells me that the nls function comes from the stats package. The official R distribution includes a number of standard packages. (These packages are divided into two groups — the “base” and “recommended” packages — but the distinction isn’t important here as they all fall under the same software development lifecycle and are all part of “official R” as defined above.) If the comes from any of the following packages, it’s considered official:

Official R packages (Base and Recommended)

base, boot, class, cluster, codetools, datasets, foreign, graphics, grDevices, grid, KernSmooth, lattice, MASS, methods, mgcv, nlme, nnet, rpart, spatial, splines, stats, stats4, survival, tcltk, tools, utils

This list has grown as R has matured, but the list above is applicable to R version 2.7.2 and above, and REvolution R version 1.2.3 and above. 

So, to sum up: R, drawing on the expertise and control of the R Core Group, has an excellent reputation for accuracy and reliability, on par with or even exceeding that of commercial software packages like SAS or SPSS. It’s suitable for any statistical analysis where you must rely on the results. All of this applies to the R distribution on CRAN, and to the REvolution R distribution, both of which comprise the official packages listed above. When it comes to user-contributed packages you download and install yourself, you’re no longer using code under the control of the R Core Group, in which case — as with all third-party code — you must rely on the reputation of the author of that package.

That’s a long answer to a seemingly simple question. But I hope it clears things up.

TAGGED:revolution r
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

data analytics and truck accident claims
How Data Analytics Reduces Truck Accidents and Speeds Up Claims
Analytics Big Data Exclusive
predictive analytics for interior designers
Interior Designers Boost Profits with Predictive Analytics
Analytics Exclusive Predictive Analytics
big data and cybercrime
Stopping Lateral Movement in a Data-Heavy, Edge-First World
Big Data Exclusive
AI and data mining
What the Rise of AI Web Scrapers Means for Data Teams
Artificial Intelligence Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Packages for By-Group Processing in R

2 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?