Data Mining Interview: Rob Hyndman

6 Min Read

I recently discovered Cross Validated, a Q&A platform for statisticians and data miners. With Seth Rogers, Community Developer of Cross Validated, we have conducted an interview of Rob Hyndman, Professor of Statistics and inventor of the platform. Thanks to both Seth and Rob for your collaborations.

I recently discovered Cross Validated, a Q&A platform for statisticians and data miners. With Seth Rogers, Community Developer of Cross Validated, we have conducted an interview of Rob Hyndman, Professor of Statistics and inventor of the platform. Thanks to both Seth and Rob for your collaborations.

Data Mining Research: Could you introduce yourself and explain your relationship to Analytics?

Rob Hyndman: I am Professor of Statistics at Monash University, Australia, and Editor-in-Chief of the International Journal of Forecasting. I’m probably best known for my work in statistical forecasting — I am author of the forecast package for R, and I’ve written a couple of books on forecasting.

DMR: What is Cross Validated and how does it help in your work?

RH: Cross Validated is a website where people can ask and answer questions about statistical topics. The site is part of the Stack Exchange network of sites, all of which are free community-driven expert Q&A sites around particular topics. We [community members] interpret statistics quite broadly- the site is intended for statisticians, data miners, and anyone else doing data analysis.

When I proposed the site in April 2010 [in Stack Exchange’s new site hatchery called Area 51] I had in mind it being useful for the thousands of researchers using statistical methods, but who may not have enough statistical training to be confident that they are using the best methods and implementing them appropriately. Having spent a couple of decades as a statistical consultant, I thought it would be nice to have a good site I could recommend to people, rather than trying to answer every question myself. It’s turned into something much bigger than that, and is now a wonderful resource for everyone doing statistics, even those who have years of experience.

Having proposed the site, I was one of the first moderators when it launched in July 2010. After about six months, I decided to take a back seat and some of the most reputable users formed a new moderation panel. They are doing a great job and I’m delighted to see the site being so active and obviously meeting the needs of so many people.

One of the nice things about CV, and other Stack Exchange sites, is that the good answers get voted up and it is easy to see what the community regards as the best answer. You can also see which of the people answering has established a reputation for providing helpful advice, based on their reputation scores. It is also extremely easy to find answers to past questions. This sets it apart from email lists and forums where you have to search through badly formatted archives. Everything is tagged and searchable, and (as of 5 October 2011) there are more than 5300 questions and more than 10000 answers that provide a repository of useful knowledge which is freely available.

DMR: Do you recommend Cross Validated to your students and colleagues?

RH: I recommend CV all the time. I’ve promoted it on my blog and I often refer people to CV when they send me questions by email. So it has helped in filtering out some of the questions that otherwise would land on my desk. I will now usually suggest that if someone has a specific question, they ask on CV first. Very often the answers are faster and better than what I would have provided. There is a wonderful community of people on CV (more than 5000 of them!) that are very helpful and willing to share their expertise.

DMR: Can you remember a particular problem or question that came up during a project that CV helped you solve?

RH: I tend to answer more questions than I ask, but occasionally I have asked a question, and I’ve always learned something from the answers. I learned a lot about causation when I asked “Under what conditions does correlation imply causation?” including several references that I was unfamiliar with. I’ve often found answers to my R questions are already available on the site in abundance.

Feel free to participate to Cross Validated.

Note: this interview will also be published on Amstatnews in December.

Share This Article
Exit mobile version