By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics in sports industry
    Here’s How Data Analytics In Sports Is Changing The Game
    6 Min Read
    data analytics on nursing career
    Advances in Data Analytics Are Rapidly Transforming Nursing
    8 Min Read
    data analytics reveals the benefits of MBA
    Data Analytics Technology Proves Benefits of an MBA
    9 Min Read
    data-driven image seo
    Data Analytics Helps Marketers Substantially Boost Image SEO
    8 Min Read
    construction analytics
    5 Benefits of Analytics to Manage Commercial Construction
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Where to Find Data to Use with R
Share
Notification Show More
Latest News
data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security
ai in software development
3 AI-Based Strategies to Develop Software in Uncertain Times
Software
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > R Programming Language > Where to Find Data to Use with R
R Programming Language

Where to Find Data to Use with R

DavidMSmith
Last updated: 2011/10/11 at 7:57 PM
DavidMSmith
6 Min Read
SHARE

(Contributing blogger Joe Rickert has put together a fantastic list of data sources suitable for use with R. If you’re looking for data to use in the Applications of R Contest — entries close October 31 — this is a great resource for you — Ed.)

(Contributing blogger Joe Rickert has put together a fantastic list of data sources suitable for use with R. If you’re looking for data to use in the Applications of R Contest — entries close October 31 — this is a great resource for you — Ed.)

Hardly a day goes by without someone or something reminding me that we are drowning in a sea of data (a bummer day ):, or that the new hero is the data scientist (a Yes! let’s go make some money kind of day!!). This morning I read “…Google grew from processing 100 terrabytes of data a day with MapReduce in 2004 to processing 20 petabytes a day with MapReduce in 2008. (Lin and Dyer, Data-Intensive Text Processing with MapReduce: Morgan&Claypool, 2010 p1) Assuming linear growth, that would mean did about 400 terabytes during the 15 minutes it took me to check my email. Even if Google is getting more than its fair share, data should be everywhere, more data that I could ever need, more than I could process, more than I could ever imagine.

More Read

analyzing big data for its quality and value

Use this Strategic Approach to Maximize Your Data’s Value

Tips on Viewing EDB Files While Managing Scalable Data Sets
What Does The Rise of Blockchain Technology Mean For Big Data?
Voodoo Spectrum of Machine Learning and Data Sets

So, how come every time I go to write a blog post or try some new stats I can never find any data?  A few hours ago I Googled “free data sets” and got over 74,000,000 hits, but it looks as if it’s going to be another evening of me with iris. What’s wrong here? At the root, it’s a deep problem that gets at the essence of data. What are data anyway? My answer: data are structured information. Part of the structure includes meta-information describing the intention and the integrity with which the data were collected. When looking for a data set, even for some purpose that is not that important we all want some evidence that the data were either collected with intentions that are similar to our intentions to use the data or that the data can be re-purposed. Moreover, we need to establish some comfort level that the data were not collected to deceive, that they are reasonable representative, reasonably randomized, reasonable unbiased etc. The more we importance we place on our project the more we tighten up on these requirements. This is not all philosophy. I think that focusing on intentions and integrity provides some practical guidance of where to search for data on the internet.

Historic financial data are relatively easy to find because the intentions with which they were collected are clear and Yahoo, Google, FRED, the St Louis Fed, Oanda for currency data and others have made it their business to collect and maintain these data. Visit quantmod for R code to read in data from these sites and even more places to find financial data. The next places high on the data intentions and integrity scale are the world’s government agencies; federal, regional and municipal. Data.gov, the official website of the United states Government, the Census Bureau, the Department of Energy, the FBI, and other agencies have interesting data sets to offer. The National Institute of Health even offers some data sets in R format. Here is a microarray data set.

Don’t just confine your search to the US. The UK is on a mission to open up the government. And, don’t just look at the federal level. Look here for a fairly clean data set on London (UK) municipal waste management and here for the dirt on a thousand or so NYC taxi complaints. The Guardian is trying to make it easy to surf the data from the worlds governments.

The sweet spots for data sets on varied topics having thousands to a few milion records are the professional data set aggregators such as infochimps , datamarket and datamob.org. Some of these sites offer data sets for sale as well as offering some free data. They all seem to do a good job of describing the data and get high intention and integrity scores. KDnuggets tracks data sets that are large enough to use for data mining projects.

For the very ambitious: check out the free data sets that are available for analysis in Amazon’s cloud.

Finally, there are several bloggers and wiki makers out there trying to build, annotate and maintain their own lists. Some of my favorites are at at blogspot and quora, the Januarist and Revolution. Stackexchange tracks data sources that have R interfaces. And, of course, there are those willing to make interesting data are available for a song.

I’m maintaining a list of public data sources at inside-R.org. Please let me know what I have missed.

TAGGED: data sets
DavidMSmith October 11, 2011
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

analyzing big data for its quality and value
Big Data

Use this Strategic Approach to Maximize Your Data’s Value

6 Min Read
edb files and databases
Big Data

Tips on Viewing EDB Files While Managing Scalable Data Sets

7 Min Read
rise of blockchain technology shaping big data
Big DataBlockchainData ManagementData QualityExclusivePrivacySecurity

What Does The Rise of Blockchain Technology Mean For Big Data?

6 Min Read

Voodoo Spectrum of Machine Learning and Data Sets

3 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?