Harvard Gets Access to Twitter Data Stream to Predict Foodborne Illness Outbreaks

Harvard Medical School, in partnership with the Boston Children’s Hospital, r

Harvard Medical School, in partnership with the Boston Children’s Hospital, recently secured a research grant to help them find new ways to track outbreaks of foodborne illnesses. Unlikely most grants though, this one isn’t a monetary award, nor did it come from a government agency or research foundation. Instead, the reward is access to tweets, and was provided by Twitter as part of their inaugural Data Grants initiative.

The idea behind the pilot program is to give select research institutions access to the (highly lucrative) Twitter datastream in order to leverage social media for social good. Harvard will be able to tap into both Twitter’s public stream, and their archive of historical data, going back to the first tweet. Some of the other Data Grant recipients include the University of East London, and UCSD.

Harvard and the Boston Children’s Hospital will attempt to correlate patterns in the Twitter data with records of foodborne illness, in order to predict and track future outbreaks. If this sounds familiar, it’s because Google created a similar tracking index for the flu in 2008. Google’s method relied on search data from millions of users worldwide to predict how the flu would spread. However, researchers have recently criticized Google’s methods, pointing out that the search giant overestimated the severity of flu outbreaks in 2011-12 by more than 50%.

The lead researcher for the Harvard project, John Brownstein, told the Harvard Crimson that Twitter offers an advantage over other internet collected data because the social platform allows for individual follow-ups. If researchers can verify a link between a user’s tweets and actual foodborne illnesses, by directly corresponding with the individual over Twitter, they can finetune their predictive model, and verify results.

In order to accomplish such a study, Harvard and the Boston Children’s Hospital will likely be using some of the same SaaS business intelligence tools that private companies utilize. As large scale data analysis begins to play a more important role in clinical studies and research, the market for both analytics software and large-scale data sets will increase.

Twitter limits access to their so-called “firehose” (a direct stream of real-time tweets) to select partners, who then resell pieces of it to other companies. While it’s unclear how much this access costs, Twitter generated over $53 million from such data licensing in the first half of 2013.

Looked at cynically, the #DataGrants program could be designed to show the validity and accuracy of their platform, which would allow Twitter to raise their asking price for data access. At the very least, it’s likely to produce positive publicity.

Whether or not Harvard’s initiative is a success won’t be known for some time. However, if it does prove effective, other social media networks such as Facebook, or rising platforms like WhatsApp, may come under pressure to allow research access to their own data sets. They may also embrace the opportunity to show off their targeting abilities. After all, as companies compete for advertising revenue, those with the most precise insight into customer behavior will be the ones left standing. What better way to show that than through an Ivy League endorsed research study?