Statistical Learning Papers

I’ve been lucky with the past few papers I’ve read which have been interesting and well-written. These first two were background on a familiar topic, while the second two are the first of a theory I haven’t yet read in detail.

These are the support vector machine classics:
1) one introducing SV regression
2) and another introducing v-SVR (v = greek ‘nu’)

I think the second one is better-written. In the second one, Scholkopf also presented an idea I haven’t seen show up since, the ‘parametric insensitivity tube’ (p.5,6). It doesn’t seem practical though.

SVMs were apparently born in AT&T’s Bell Labs and are considered state-of-the-art for many problems. But it appears Microsoft Research has a competing project (and true to their reputation, it’s patented).

Relevance Vector Machines were introduced in 2000 with this bold and provocative abstract:

The support vector machine (SVM) is a state-of-the-art technique for regression and classification, combining excellent generalisation properties with a sparse kernel representation. However, it does suffer from a number of disadvantages, notably the absence of probabilistic outputs, the requirement to estimate a trade-off parameter and the ne…

These are the support vector machine classics:
1) one introducing SV regression
2) and another introducing v-SVR (v = greek ‘nu’)

Relevance Vector Machines were introduced in 2000 with this bold and provocative abstract:

The support vector machine (SVM) is a state-of-the-art technique for regression and classification, combining excellent generalisation properties with a sparse kernel representation. However, it does suffer from a number of disadvantages, notably the absence of probabilistic outputs, the requirement to estimate a trade-off parameter and the need to utilise `Mercer ‘ kernel functions. In this paper we introduce the Relevance Vector Machine (RVM), a Bayesian treatment of a generalised linear model of identical functional form to the SVM. The RVM suffers from none of the above disadvantages, and examples demonstrate that for comparable generalisation performance, the RVM requires dramatically fewer kernel functions. [empasis added]

Here are two more papers, the new RVM classics, both by Tipping. RVM’s seem promising for financial forecasting because they have one less parameter than SVR (eliminating C, but unfortunately keeping the kernel parameters such as width in the case of a Gaussian RBF kernel).
3) introducing the Relevance Vector Machine
4) it looks like 1 year later Tipping fleshed the theory out some more and published a detailed version

#4 clearly states “Editor: Alex Smola”. Smola is one of the key early players in SVMs (for ex. as a co-author to Scholkopf in #2 above). Perhaps Smola switched to the RVM camp? ATT vs. MSFT. Smola doesn’t seem to be as prominent as Scholkopf or, of course, Vapnik, but I have enjoyed quite a few hours of his lectures. Anyway, that’s enough speculation. Both theories are very interesting and practical and both teams write good papers.

My main goal for posting things like this is to see if anyone else has papers they thought were interesting or other ideas about the ones above. So please feel free to email me or leave a comment.