Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    big data analytics in transporation
    Turning Data Into Decisions: How Analytics Improves Transportation Strategy
    3 Min Read
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: What is R?
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > What is R?
Data MiningPredictive Analytics

What is R?

DavidMSmith
DavidMSmith
8 Min Read
SHARE
What is R? It seems like a simple question, but I fear this is going to be a long post.

My main motivation is to clear up a bit of confusion around the distinction between R itself and user-contributed packages for R. It was prompted by a recent discussion about R on the MedStats mailing list, which included this comment:

After all the positive comments, I would like to raise some concern about some of the non-standard R packages. I have twice experienced a serious error in R packages (not a bug, an error in the algorithm). The authors of the first one did not reply; the authors of the second one said they know about it but do not have the time to fix it. I wonder how many — especially of the PhD/postdoc written packages, which I am sure work for their project — are really working correctly in all situations? Not all of them work on their packages as hard and great work as e.g. D. Bates with his lme4 (GLMM) package, and he and users still discover bugs and flaws in it. I do not want to criticize R; I am using it and I believe that the core packages are as valid as from commercial software (or better). But as I said, I have  doubts …

What is R? It seems like a simple question, but I fear this is going to be a long post.

More Read

Image
Derailing Your Supply Chain BI Project
IBM and ILOG – What Else?
Can Predictive Analytics Help Improve Your Instagram Strategy?
What’s the Big Deal About Big Data?
First Look – EpiAnalytics
My main motivation is to clear up a bit of confusion around the distinction between R itself and user-contributed packages for R. It was prompted by a recent discussion about R on the MedStats mailing list, which included this comment:

After all the positive comments, I would like to raise some concern about some of the non-standard R packages. I have twice experienced a serious error in R packages (not a bug, an error in the algorithm). The authors of the first one did not reply; the authors of the second one said they know about it but do not have the time to fix it. I wonder how many — especially of the PhD/postdoc written packages, which I am sure work for their project — are really working correctly in all situations? Not all of them work on their packages as hard and great work as e.g. D. Bates with his lme4 (GLMM) package, and he and users still discover bugs and flaws in it. I do not want to criticize R; I am using it and I believe that the core packages are as valid as from commercial software (or better). But as I said, I have  doubts about some hardly used ones.

It’s a fair point, but needs some clarification for readers not familiar with R. The distinction is one between “official R” and user-contributed code (which is what the commenter above is discussing).

By “official R” I mean the R project, under the control of the R Core Group. This is what you get when your download R from the CRAN website, and what’s included in REvolution R distribution. This includes both the R interpreter (the code that implements the language at the heart of R), and the various statistical functions included in the official R distribution. These components and functions are all managed under a strict software development lifecycle, and have the highest reputation for accuracy and reliability. This is what makes R suitable for all statistical analysis applications where you need the utmost confidence in the result, such as the analysis of clinical trial data. This is R. 

Now, R isn’t just a closed statistical analysis environment. It’s also designed to be a platform for other individuals to create their own methods and applications. Research institutions, academics and, yes, students, use R to implement brand-new statistical methods as part of research projects (or, sometimes, just for fun). They collect these new functions into collections called “packages” and upload them to section of CRAN dedicated to user contributions. (This is distinct from the area in CRAN where the official R distribution is found.) Some of these user-contributed packages are major bodies of work in their own right, regularly maintained and tested by their respective authors. Some are student projects, long since abandoned. Just as when using a SAS macro downloaded from a website, or installing a third-party Excel add-in, you’ll need to rely on the reputation of the author (or the recommendation of trusted peers) when deciding whether to use such third-party code.

If you’re in the habit of downloading packages from CRAN, how do you tell if a function you’re using is an official R function, and not a user-contributed one? One easy way is to use the function find, which will tell you which package the function comes from.  For example, let’s check the function nls (nonlinear least squares):

> find(“nls”)
[1] “package:stats”

This tells me that the nls function comes from the stats package. The official R distribution includes a number of standard packages. (These packages are divided into two groups — the “base” and “recommended” packages — but the distinction isn’t important here as they all fall under the same software development lifecycle and are all part of “official R” as defined above.) If the comes from any of the following packages, it’s considered official:

Official R packages (Base and Recommended)

base, boot, class, cluster, codetools, datasets, foreign, graphics, grDevices, grid, KernSmooth, lattice, MASS, methods, mgcv, nlme, nnet, rpart, spatial, splines, stats, stats4, survival, tcltk, tools, utils

This list has grown as R has matured, but the list above is applicable to R version 2.7.2 and above, and REvolution R version 1.2.3 and above. 

So, to sum up: R, drawing on the expertise and control of the R Core Group, has an excellent reputation for accuracy and reliability, on par with or even exceeding that of commercial software packages like SAS or SPSS. It’s suitable for any statistical analysis where you must rely on the results. All of this applies to the R distribution on CRAN, and to the REvolution R distribution, both of which comprise the official packages listed above. When it comes to user-contributed packages you download and install yourself, you’re no longer using code under the control of the R Core Group, in which case — as with all third-party code — you must rely on the reputation of the author of that package.

That’s a long answer to a seemingly simple question. But I hope it clears things up.

TAGGED:revolution r
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

AI role in medical industry
The Role Of AI In Transforming Medical Manufacturing
Artificial Intelligence Exclusive
b2b sales
Unseen Barriers: Identifying Bottlenecks In B2B Sales
Business Rules Exclusive Infographic
data intelligence in healthcare
How Data Is Powering Real-Time Intelligence in Health Systems
Big Data Exclusive
intersection of data
The Intersection of Data and Empathy in Modern Support Careers
Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Packages for By-Group Processing in R

2 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive
ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?