How to Access 100M Time Series in R in Under 60 Seconds

August 26, 2011
186 Views

DataMarket, a portal that provides access to more than 14,000 data sets from various public and private sector organizations, has more than 100 million time series available for download and analysis.

DataMarket, a portal that provides access to more than 14,000 data sets from various public and private sector organizations, has more than 100 million time series available for download and analysis. (Check out this presentation for more info about DataMarket.) And now with the new package rdatamarket, it’s trivially easy to import those time series into R for charting, analysis, or anything. Here’s what you need to do:

  1. Register an account on DataMarket.com (it’s free)
  2. Install the rdatamarket package in R with install.packages(“rdatamarket”)
  3. Browse DataMarket.com for a time series of interest (I found this series on unemployment)
  4. Copy the URL of the page you’re on (the short URL works too, I used “http://data.is/qb61uf”)
  5. Use the dmseries function with the URL to extract the time series as a zoo object

Here’s an example:

> library(rdatamarket)
> dminfo("http://data.is/qb61uf")
Title: "Persons Unemployed 15 weeks or longer, as a percent of the civilian labor force"
Provider: "Federal Reserve Bank of St. Louis" (citing "U.S. Department of Labor: Bureau of Labor Statistics")
Dimensions:
> unemp <- dmseries("http://data.is/qb61uf")
> plot(unemp)
> str(unemp)
zoo’ series from Jan 1948 to Jul 2011
Data: num [1:763, 1] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:763] "1" "2" "3" "4" ...
..$ : chr "Persons.Unemployed.15.weeks.or.longer..as.a.percent.of.the.civilian.labor.force"
Index: Class 'yearmon' num [1:763] 1948 1948 1948 1948 1948 ...

Created by Pretty R at inside-R.org

US Unemployment

With this package, you can go from finding interesting data on DataMarket to working with it in R in less than a minute. With such a wealth of data so easily available to the power of R, this will be a fantastic tool for all data scientists and data journalists.

CRAN: rdatamarket package