Analysis of a Bad Indicator

August 21, 2009
161 Views

I watched a video lecture, as I often do, on data analysis. here’s the video: the Hilbert Spectrum. Here are the notes I took while watching it:


The idea is appealing- to decompose a time series into underlying trends of different periodicities. In the trading world this would correspond to maybe a long term macroeconomic trend, a monthly pattern occurring around announcement of the federal funds rate, and a short term pattern caused by supply and demand and liquidity constraints. The researcher in the video was trying to study ocean waves with satellite data. Obviously there may be a difference in the two processes.

I implemented the Hilbert spectrum algorithm because I was excited about it. Here’s the R script. For example, here’s what the spectrum looks like for GOOG & TYP share prices:

At the top is the actual price series and below that are the series with the high frequency patterns removed one by one. They look nice.

Here’s the code, hspect.r, in the language R. R is basically an advanced calculator that’s also programmable.

The problem is that this is a type of smoother, useful for summarizing and exploring data, but useless for extrapolation or prediction. Among this


I watched a video lecture, as I often do, on data analysis. here’s the video: the Hilbert Spectrum. Here are the notes I took while watching it:


The idea is appealing- to decompose a time series into underlying trends of different periodicities. In the trading world this would correspond to maybe a long term macroeconomic trend, a monthly pattern occurring around announcement of the federal funds rate, and a short term pattern caused by supply and demand and liquidity constraints. The researcher in the video was trying to study ocean waves with satellite data. Obviously there may be a difference in the two processes.

I implemented the Hilbert spectrum algorithm because I was excited about it. Here’s the R script. For example, here’s what the spectrum looks like for GOOG & TYP share prices:

At the top is the actual price series and below that are the series with the high frequency patterns removed one by one. They look nice.

Here’s the code, hspect.r, in the language R. R is basically an advanced calculator that’s also programmable.

The problem is that this is a type of smoother, useful for summarizing and exploring data, but useless for extrapolation or prediction. Among this family is cubic spline interpolation and LOESS. At the edges, if you extend these curves to make predictions the estimates will have extremely high variance. Making predictions with one of these smoothers is equivalent to throwing away almost all your data except the bit at the very end, and then either fitting a 3rd degree polynomial to it (in cubic spline interpolation) or a straight line (in LOESS).

Cubic spline interpolation is especially insidious because most people don’t understand it and a confusing name doesn’t help. Everyone knows how to interpret two derivatives: velocity and acceleration. The third derivative is interpretable, in two different contexts, as curvature or as burst. Burst is like if you’re standing in an elevator and it goes up, how much you feel it. If the elevator is designed will, burst
will be a constant and you will barely feel it. It’s also important in roller coaster design to ensure you have a smooth ride. In terms of curvature, if the third derivative is constant, it will be pleasing to the eye as if it were drawn by sweeping hand motions. That’s the qualitative explanation. This latter interpretation of curvature is what cubic spline interpolation is based on. The cubic spline
interpolation fits a nice-looking piecewise (between each two points) polynomial which matches 1st and 2nd derivatives at each knot.

Unfortunately you have to understand these methods to know not to use them and not to trust systems based on them. I’ve had people contact me about using cubic spline interpolation for prediction but it’s just not applicable.

Feel free to add your own thoughts.