R –Refcards and Basic I/O Operations

December 26, 2008
88 Views

While working with a large number of files for data processing, I used the following R commands for data processing. Given that everyone needs to split as well merge and append data – I am just giving some code on splitting data based on parameters , and appending data as well as merging data.
 
Splitting Data […]

While working with a large number of files for data processing, I used the following R commands for data processing. Given that everyone needs to split as well merge and append data – I am just giving some code on splitting data based on parameters , and appending data as well as merging data.

 

Splitting Data Based on a Parameter.

The following divides the data into subsets which contain either Male or anything else in different datasets.

Input and Subset

Note the read.table command assigns the dataset name X in R environment from the file reference (path denoted by ….)

x <- read.table(....)
rowIndx <- grep("Male", x$col)
write.table(x[rowIndx,], file="match")
write.table(x[-rowIndx,], file="nomatch")

Suppose we need to divide the dataset into multiple data sets.


X17 <- subset(X, REGION == 17)

This is prefered to the technique -
attach(X)
X17 = X[REGION == 17,]

 

Output

For putting the files back to the Windows environment you can use-

write.table(x,file="",row.names=TRUE,col.names=TRUE,sep=" ")

Append

Lets say you have a large number of data files ( say csv files )

that you need to append (assuming the files are in same syrycture)

after performing basic operations on them.

 

>setwd("C:\\Documents and Settings\\admin\\My Documents\\Data")

Note this changes the working folder to folder you want it to be,

note the double slashes which are needed to define the path

>list.files(path = ".", pattern = NULL, all.files = FALSE, full.names = FALSE,

+     recursive = FALSE, ignore.case = FALSE)

The R output would be something like below

 

 [1] "cal1.csv"                                     "cal2.csv"                                           

[3] "cal3.csv"                                     "cal4.csv"                                           

[5] "cal5.csv"                                     "cal6.csv"                                           

[7] "cal7.csv"                                     "cal8.csv"

 

Now you can use the file.append command for succesively appending the second file

to the first file.

If writing a lot of similar code is a tedium use the & (concatenate) function

in excel to create the code.Note the Formula Bar (B7=A7&C7&D7&E7)

Excel is useful because it is good in click and drag repetitive text and

concatenation is easily done.

image

 

The output would be something like

>file.append("cal1.csv","cal2.csv")
[1] TRUE
>file.append("cal1.csv","cal3.csv")
[1] TRUE
>file.append("cal1.csv","cal4.csv")
[1] TRUE
>file.append("cal1.csv","cal5.csv")
[1] TRUE
>file.append("cal1.csv","cal6.csv")
[1] TRUE
>file.append("cal1.csv","cal7.csv")
[1] TRUE
>file.append("cal1.csv","cal8.csv")
[1] TRUE

 

Note all data here gets appended to filecal1.csv

This should be a good starting point for you to trying out R.

For a Reference Sheet, here is an excellent reference sheet from Tom Short,

and it is aptly called the Short Refcard

(http://cran.r-project.org/doc/contrib/Short-refcard.pdf)

Note- Experienced analytics people are best served by

www.rforsasandspssusers.com

Anyways MeRRy ChRistmas !

 

Short Refcard