Packages for By-Group Processing in R

Analyst and BI expert Steve Miller takes a look at the facilities in R for doing “by-group” processing of data. The task consisted of:

… read several text files, merge the results, reshape the intermediate data, calculate some new variables, take care of missing values, attend to meta data, execute a few predictive models and graph the results.
Analyst and BI expert Steve Miller takes a look at the facilities in R for doing “by-group” processing of data. The task consisted of:
… read several text files, merge the results, reshape the intermediate data, calculate some new variables, take care of missing values, attend to meta data, execute a few predictive models and graph the results.
Then repeat the models and graphs for groups or sub-populations marked by distinct values of one or more dimension variables of interest.
The latter step is commonly referred to as “by-group processing.” SAS programmers will recognize by group processing with syntax that invokes a procedure on a sorted data set that looks something like:
proc reg data = dblahblah; by vblahblah;
Check out Steve’s post for how he addressed this in R using the high-performance data.table package by Matthew Dowle (and as Steve suggests, a good place to get started is the example vignettes).
I’d also add a recommendation for the plyr package which also offers tools to split up data sets by various criteria, and then do by-processing. Here, the plyr: divide and conquer guide is a good place to start. As an added bonus, you can also divide and conquer the computations by exploiting multiple nodes in parallel by engaging a parallel backend for the foreach function. (Note for Windows users: the doSMP backend from Revolution R is also available now on R-Forge and will be on CRAN soon, too.)
Information Management: By-Group Processing, the R data.table and the Power of Open Source

Packages for By-Group Processing in R

Follow us on Facebook

Latest News

How Data-Driven Grocery Recommendations Help Shoppers Eat Better With Less Effort

How Data-Driven Businesses Protect MySQL Databases from Shutdown

Reducing “Work About Work” with AI Task Managers

Why Rodent-Resistant Conduits Are Critical for Data Center Uptime

Stay Connected

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

Artificial Intelligence for eCommerce: A Closer Look

How To Get An Award Winning Giveaway Bot

Quick Link

Follow us on Facebook

Latest News

How Data-Driven Grocery Recommendations Help Shoppers Eat Better With Less Effort

How Data-Driven Businesses Protect MySQL Databases from Shutdown

Reducing “Work About Work” with AI Task Managers

Why Rodent-Resistant Conduits Are Critical for Data Center Uptime

Stay Connected

You Might also Like

What is R?

Artificial Intelligence for eCommerce: A Closer Look

How To Get An Award Winning Giveaway Bot