Personalizing your RSS Feeds

December 22, 2008
130 Views
78.8%. This is the estimated accuracy with which an algorithm is able to predict what kind of information i like to read and thus what kind of news seem to be more interesting to me.

The system proves itself every day and it actually helps a lot because it can spot instantly the information that i like among hundreds of RSS news headers. Such service is so much more than simple keyword matching because it takes into account the combination of wor

78.8%. This is the estimated accuracy with which an algorithm is able to predict what kind of information i like to read and thus what kind of news seem to be more interesting to me.

The system proves itself every day and it actually helps a lot because it can spot instantly the information that i like among hundreds of RSS news headers. Such service is so much more than simple keyword matching because it takes into account the combination of words and thus it can differentiate news (in terms of how interesting they are) even if these news are about similar concepts.

Personalization of RSS feeds is a well-known application of text classification. The amount of information -the header- is almost always 2-3 sentences long which makes it ideal for feeding it to a classifier. The software that i built is quite simple : First i have a list of about 10 RSS sources : Financial, Medical, International News, Tech news etc. The application scans the RSS feeds every 20 minutes and each new header is appended to a text file on my hard disk.

When i ran the application for the first time, i simply saved all headers on the hard disk and built my first text classifier. But that was back then. Today the classifier scans the RSS feeds and automatically appends an RSS header to either the “Interesting” text file or the “Uninteresting” text file…and it does so correctly most of the time.

When i have some spare time, i look at the classified headers and correct the errors my classifier made by putting the right headers to the right place. I then re-train the classifier and everything is ready for the next run.


Also interesting is the fact that, the more i am using the classifier the better it gets in terms of the training frequency it requires. During the first week of usage i had to produce a model almost every day…but not any more.

So RSS personalization is an application to look at. First of all it saves a lot of time. Second, really useful applications can emerge. For example, consider one investor that wishes to know if something significant enough occurs on the news that might affect the markets for better or for worse. Notice that on a previous post, i described how i am flagging every news header as an important or unimportant one. Therefore if a classifier is able to differentiate accurately enough as to what is important, then an investor can receive e-mail alarms -or even SMS messages if he is not online- of the event. Perhaps, the message might even include how much a stock or an index is likely to be affected by the breaking news.

There is also the personalization that results from collaborative filtering. However, i believe that a “personal news classifier” -if i may call it like that- after sufficient training time, does a much better job in terms of its predictive accuracy.

Link to original post