Packages for By-Group Processing in R

Analyst and BI expert Steve Miller takes a look at the facilities in R for doing “by-group” processing of data. The task consisted of:

… read several text files, merge the results, reshape the intermediate data, calculate some new variables, take care of missing values, attend to meta data, execute a few predictive models and graph the results.

Analyst and BI expert Steve Miller takes a look at the facilities in R for doing “by-group” processing of data. The task consisted of:

… read several text files, merge the results, reshape the intermediate data, calculate some new variables, take care of missing values, attend to meta data, execute a few predictive models and graph the results.

Then repeat the models and graphs for groups or sub-populations marked by distinct values of one or more dimension variables of interest.

The latter step is commonly referred to as “by-group processing.” SAS programmers will recognize by group processing with syntax that invokes a procedure on a sorted data set that looks something like:

proc reg data = dblahblah; by vblahblah;

Check out Steve’s post for how he addressed this in R using the high-performance data.table package by Matthew Dowle (and as Steve suggests, a good place to get started is the example vignettes).

I’d also add a recommendation for the plyr package which also offers tools to split up data sets by various criteria, and then do by-processing. Here, the plyr: divide and conquer guide is a good place to start. As an added bonus, you can also divide and conquer the computations by exploiting multiple nodes in parallel by engaging a parallel backend for the foreach function. (Note for Windows users: the doSMP backend from Revolution R is also available now on R-Forge and will be on CRAN soon, too.)

Information Management: By-Group Processing, the R data.table and the Power of Open Source

Packages for By-Group Processing in R

Follow us on Facebook

Latest News

How Data Analytics Is Reshaping Patient Financing Decisions

How AI-Driven Workflows Are Changing the Way Companies Think About Data Risk

Why Businesses Outsource AI Product Development Companies

The Fintech and Banking Tools Global Entrepreneurs Rely On

Stay Connected

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

Artificial Intelligence for eCommerce: A Closer Look

5 Great Tips for Using Data Analytics for Website UX

Quick Link

Follow us on Facebook

Latest News

How Data Analytics Is Reshaping Patient Financing Decisions

How AI-Driven Workflows Are Changing the Way Companies Think About Data Risk

Why Businesses Outsource AI Product Development Companies

The Fintech and Banking Tools Global Entrepreneurs Rely On

Stay Connected

You Might also Like

What is R?

Artificial Intelligence for eCommerce: A Closer Look

5 Great Tips for Using Data Analytics for Website UX