Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Learning R
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Learning R
Data MiningData Visualization

Learning R

DavidMSmith
DavidMSmith
8 Min Read
SHARE

When R is brought up as a possibility for doing statistics or data mining or any sort of predictive analytics among non R users, someone will invariably point out that R has a “steep learning curve”, and the response among those gathered usually includes a significant amount of head nodding. Even those who have put in heroic efforts to help people learn R sometimes say scary things: e.g. in his introduction to Rattle, a Data Mining GUI for R, Graham Williams writes:

 R offers a breadth and depth in statistical computing beyond what is available in commercial closed source products. Yet R remains, primarily, a programming language for the highly skilled statistician, and out of the reach of many. (The R Journal Vol. ½, December 2009)

Are things really that bad? Is R that difficult to learn? I think not – unless you take the statement to refer to the absolutely worst case scenario that you can think of: the effort that would be involved in learning R by a person who doesn’t have any background in statistics or data analysis and who has absolutely no interest in learning R. In this context the statement that R has a steep learning curve conveys the same …

When R is brought up as a possibility for doing statistics or data mining or any sort of predictive analytics among non R users, someone will invariably point out that R has a “steep learning curve”, and the response among those gathered usually includes a significant amount of head nodding. Even those who have put in heroic efforts to help people learn R sometimes say scary things: e.g. in his introduction to Rattle, a Data Mining GUI for R, Graham Williams writes:

More Read

How is Performance Management like Multi-wavelength Astronomy?
Predictive Analytics World New York City Conference Announces Speaker Line-Up
Demystifying Data Warehouses, Data Lakes and Data Marts
Wikipedia entry for SPSS Clementine
Big Data Statistics in the Search for a Cure for MS

 R offers a breadth and depth in statistical computing beyond what is available in commercial closed source products. Yet R remains, primarily, a programming language for the highly skilled statistician, and out of the reach of many. (The R Journal Vol. ½, December 2009)

Are things really that bad? Is R that difficult to learn? I think not – unless you take the statement to refer to the absolutely worst case scenario that you can think of: the effort that would be involved in learning R by a person who doesn’t have any background in statistics or data analysis and who has absolutely no interest in learning R. In this context the statement that R has a steep learning curve conveys the same truth as the assertion that Chinese or Italian or Japanese or English or any other language is difficult to learn if you don’t know anything about the people who speak these languages, don’t want to know, and wouldn’t have an opportunity to practice the language anyway. There is some truth to this, but so what? Context is everything. If you have some background in statistics and a desire to learn some more, learning R is not going to be an insurmountable problem. Once you have some version of R installed on your favorite computing platform, any book that provides carefully worked examples of the kinds of statistical analyses that interest you should be all that is required to make you productive. These days, there are dozens or maybe even hundreds of statistics books that use R. Two of my personal favorites are John Fox’s classic An R and S Plus Companion to Applied Regression  (Sage Publications, 2002) and Data Analysis and Graphics Using R: An Example-Based Approach (Cambridge Series in Statistical and Probabilistic Mathematics) by John Maindonald and W. John Braun (2010). Books, of course, are only the tip of the iceberg of the resources available for the statistical cognoscenti to learn R. Have a look at the links on Inside-R as places to start on the web.

This discussion does raise the issue of what one means by learning a language. Knowing how to run the R scripts required to get some simple analysis done may not entitle you to say that you know R (significantly more is required to become an R developer) anymore than understanding enough Italian to get around Rome while eating well entitles you to say that you know Italian – but again, so what? It is possible to do some pretty impressive statistics using R without any deeper understanding of the R language than what is required to run the applicable models. On the other hand, if you don’t know any statistics and you don’t want to know more than you have to, then that might be a big problem. But, it’s the statistics part that has the steep learning curve, not R. Maybe this is the point of William’s comment: R remains the language of highly skilled statisticians because it is the language most capable of expressing what highly trained statisticians think about, and R is out of the reach of the many who just don’t care much about statistics. But, if you are reading this post this many probably doesn’t include you.

So suppose you are not a statistician but belong to some other data analysis culture, data mining, for example, and you want to learn R. Well, like any of the world’s great natural languages R approachable from many different starting points. The entry point may be different but the path to knowledge is going to be similar. Find something you care about and start formulating simple sentences. There are fewer R language guides available for people who have some data mining skills, but that is changing. Seni and Elder have recently published a very nice little book on ensemble methods: Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions (Synthesis Lectures on Data Mining and Knowledge Discovery) that I highly recommend.  And, of course the masters Hastie, Tibshirani and Freidman authors of the classic text, Elements of Statistical Learning, have made both the text and much of the their code available. Moreover, if you are a data miner with a background in computer science you can probably code rings around the highly trained statisticians. If so, you may find the books by Chambers, Software for Data Analysis and Gentlemen, R Programming for Bioinformatics of great value.

Finally, to be perfectly fair, I should acknowledge Graham Williams’ quote in the context in which he wrote it. Like any other language, R is much easier to learn when you have access to the best learning tools, CDs, dictionaries, grammars etc. Rattle is one such high value learning tool and so is Revolution’s R Productivity Environment. But, more about these on another day.

Link to original post

TAGGED:data miningr
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

data analytics and truck accident claims
How Data Analytics Reduces Truck Accidents and Speeds Up Claims
Analytics Big Data Exclusive
predictive analytics for interior designers
Interior Designers Boost Profits with Predictive Analytics
Analytics Exclusive Predictive Analytics
big data and cybercrime
Stopping Lateral Movement in a Data-Heavy, Edge-First World
Big Data Exclusive
AI and data mining
What the Rise of AI Web Scrapers Means for Data Teams
Artificial Intelligence Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Interactive stock visualizations with R

3 Min Read

Tracking the data trackers

4 Min Read
Data Catalog
AnalyticsData ManagementData MiningData QualityData Warehousing

Moving to Self-Serve Analytics? You Need a Data Catalog

5 Min Read

Predictive Analytics World Recap

5 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?