Resampling Data in Hadoop with RHadoop

February 28, 2013
10 Views

On Revolution Analytics partner Cloudera’s blog, Uri Laserson has posted an excellent guide to resampling from a large data set in Hadoop. Resampling is an important step in fitting ensemble models (including random forests and other bagging techniques), and Uri provides a step-by-step guide to implementing resampling methods using RHadoop.

On Revolution Analytics partner Cloudera’s blog, Uri Laserson has posted an excellent guide to resampling from a large data set in Hadoop. Resampling is an important step in fitting ensemble models (including random forests and other bagging techniques), and Uri provides a step-by-step guide to implementing resampling methods using RHadoop. He provides the complete map-reduce code in the R language, as well as a useful script for installing RHadoop on a Cloudera instance.  

By the way, if you’re new to RHadoop, here’s RHadoop creator and project leader Antonio Piccolboni introducting RHadoop at last year’s Strata CA conference.

  

Cloudera blog: How-to: Resample from a Large Data Set in Parallel (with R on Hadoop)

You may be interested

4 Easy Steps How to Conduct IT Security Audit of Your Own Company
IT
0 shares215 views
IT
0 shares215 views

4 Easy Steps How to Conduct IT Security Audit of Your Own Company

GoganMarcell - May 22, 2017

Businesses often view data security audit as a stressful and intrusive process. Auditor walks around distracting everybody and meddling in…

Law Firms: You Need Enterprise Level Cyber Security
Data Management
0 shares342 views
Data Management
0 shares342 views

Law Firms: You Need Enterprise Level Cyber Security

Larry Alton - May 21, 2017

Long ago, it was up to the IT department to protect a company’s data from cyber thieves and internal security…

Get The Best Out Of Your Marketing Strategy By Integrating Analytics
Analytics
0 shares384 views
Analytics
0 shares384 views

Get The Best Out Of Your Marketing Strategy By Integrating Analytics

Christopher - May 21, 2017

If you look around, you will find a huge chunk of data everywhere. Want to check the success of your…