Prediction Is Hard, Especially About The Future

August 19, 2009
67 Views

That Niels Bohr certainly knew what he was talking about! But that hasn’t discouraged folks in any number of industries from trying to make predictions.

Google in particular has been researching the predictability of search trends (just to be fair and balanced, so have Bing and Yahoo). Yossi Matias, Niv Efron, and Yair Shimshoni at Google Labs Israel have made some fascinating observations based on Google Trends, including the following:

  • Over half of the most popular Google search queries are predictable in a 12-month ahead forecast, with a mean absolute prediction error of about 12%.
  • Nearly half of the most popular queries are not predictable (with respect to the model we have used).
  • Some categories have particularly high fraction of predictable queries; for instance, Health (74%), Food & Drink (67%) and Travel (65%).
  • Some categories have particularly low fraction of predictable queries; for instance, Entertainment (35%) and Social Networks & Online Communities (27%).
  • The trends of aggregated queries per categories are much more predictable: 88% of the aggregated category search trends of over 600 categories in Insights for Search are predictable, with a mean

That Niels Bohr certainly knew what he was talking about! But that hasn’t discouraged folks in any number of industries from trying to make predictions.

Google in particular has been researching the predictability of search trends (just to be fair and balanced, so have Bing and Yahoo). Yossi Matias, Niv Efron, and Yair Shimshoni at Google Labs Israel have made some fascinating observations based on Google Trends, including the following:

  • Over half of the most popular Google search queries are predictable in a 12-month ahead forecast, with a mean absolute prediction error of about 12%.
  • Nearly half of the most popular queries are not predictable (with respect to the model we have used).
  • Some categories have particularly high fraction of predictable queries; for instance, Health (74%), Food & Drink (67%) and Travel (65%).
  • Some categories have particularly low fraction of predictable queries; for instance, Entertainment (35%) and Social Networks & Online Communities (27%).
  • The trends of aggregated queries per categories are much more predictable: 88% of the aggregated category search trends of over 600 categories in Insights for Search are predictable, with a mean absolute prediction error of of less than 6%.

You can read their full 32-page paper here.

I’m not surprised at the predictability of human search behavior, especially for stable topics or even for unstable ones viewed as aggregates – one could argue the celebrities and scandals du jour are unpredictable but interchangeable. What I’m curious about is what we can do with this predictability.

In the SIGIR ‘09 session on Interactive Search, Peter Bailey talked about “Predicting User Interests from Contextual Information,” analyzing the predictive performance of contextual information sources (interaction, task, collection, social, historic) for different temporal durations. Max Van Kleek wrote a nice summary of the talk at the Haystack blog. The paper doesn’t investigate seasonality (perhaps because they only looked at four months of data), but I’d imagine they would subsume it under the broader categories of historic and social context. But they do set a clear goal:

Post query navigation and general browsing behaviors far outweigh direct search engine interaction as an information-gathering activity… Designers of Website suggestion systems can use our findings to provide improved support for post-query navigation and general browsing behaviors.

I hope Google is following a similar agenda. If you’re going to go through the trouble of predicting the future, then help make it a better one for users!

Link to original post