Data mining competition: HIV progression

July 8, 2010
102 Views

Kaggle, a data-prediction company, is inviting numerati the world over to mine a small set of data and predict patterns in HIV. Offering $500, the competition has already drawn in 71 individuals and teams, and some of them have submitted dozens of predictions.

Kaggle, a data-prediction company, is inviting numerati the world over to mine a small set of data and predict patterns in HIV. Offering $500, the competition has already drawn in 71 individuals and teams, and some of them have submitted dozens of predictions.

As we saw in the famous Netflix datamining contest, these competitions can generate a lot of activity and research. And while Netflix offered a million dollars, Kaggle demonstrates that big money is not necessary. Researchers are drawn by the challenge, the data, and the chance to mingle with (and show-off before) their global peers. What’s more, these competitions are wonderful for networking. As competitors scan the other studies, they see new approaches, meet new colleagues, and often they team up. The winning Netflix team was a coalition of people who met on the competition site.

Kaggle founder, Anthony Goldbloom, argues that competitions can move scientific research far faster than the traditional process involving peer-reviewed papers. In a blog post, he writes:

Whereas scientific literature tends to evolve slowly (somebody writes a paper, somebody else tweaks that paper and so on), a competition inspires rapid innovation by introducing the problem to a wide audience. There are an infinite number of approaches that can be applied to any modeling task and it is impossible to know at the outset which technique will be most effective. By exposing a problem to a wide audience, competitions expose the problem to a range of different techniques. This maximises the chances of finding a solution, and gets the most out of any particular dataset- given its inherent noise and richness.