Big Data Trees with Hadoop HDFS
Last month's release of Revolution R Enterprise 6.1 added the capability to fit decision and regresson trees on large data sets (using a new parallel external memory algorithm included in the RevoScaleR package). It also introduced the possibility of applying this and the other big-data statistical methods of RevoScaleR to data files distributed in in Hadoop's HDFS file system*, using the Hadoop nodes themselves as the compute engine (with Revolution R Enterprise installed). Revolution Analytics' VP of Development Sue Ranney explained how this works in a recent webinar. I've embedded the slides below, and you can also watch the webinar recording on YouTube.
[*] Or to use the department of redundancy department-approved acronym, HHFDSFS