Predicting popular stories on Digg

November 26, 2008
120 Views

On its latest news, KDNuggets mentions a paper from HP Labs that outlines a process of analyzing and predicting the popularity of a Digg story or a YouTube video submission.

No question that this is interesting material. On my post dated October 16th, 2007 i presented an analysis that i performed on what keywords seem to play a part on a post being popular on Digg. You can find all 3 parts of the post here.

So i made a new run to collect the sto


On its latest news, KDNuggets mentions a paper from HP Labs that outlines a process of analyzing and predicting the popularity of a Digg story or a YouTube video submission.

No question that this is interesting material. On my post dated October 16th, 2007 i presented an analysis that i performed on what keywords seem to play a part on a post being popular on Digg. You can find all 3 parts of the post here.

So i made a new run to collect the stories from Digg and this is an example of what i came with (please note : For illustrative purposes only) :

The paper from HP Labs takes a different route and makes its predictions based on the popularity of a submitted story in the first few hours rather than after some days. The authors also conclude that after a digg story is out, users tend to vote for it in the beginning but when a specified threshold time has passed the rate with which the story is digged fades away. On the contrary, videos submitted on YouTube are being viewed by users on a linear trend after the video is submitted.

It is true that there is an inherent nature of seasonality on the news and the way that users ‘digg’ stories. It is also interesting to see at buzzwords that seem to keep repeating (in terms of how interesting they are -or not) over time.

Between the previous runs that i have made and the current one, i have seen some repeating patterns. One of these patterns shows Microsoft on a declining trend in terms of how much of an interesting subject it appears for digg users. Here is what Google Trends shows about the term ‘Microsoft’ :

Could such a trend may be a glimpse on Microsoft’s ‘future’ somehow?

I have already built a text classifier which accepts phrases and shows the probability of this phrase being highly ‘digged’ based on the keywords that the phrase has. More on this on a future post.

Link to original post