# Conceptualizing Learning Error

When you are trying to find the correct equation to model and predict financial data, you will always have some error. If you are using regression to predict the next period’s return, you will probably measure the accuracy by mean squared error (MSE).

Error can be broken down into two components, and these two components can be interpreted as the sources of the error. Error = Bias + Variance

1) Bias is the error incurred by the expected prediction relative to the optimal/true prediction (Bias = E[y]-f(x), where f(x) is the true prediction and y is the approximation).
For example, using a 1st degree polynomial (a line) to approximate a 2nd degree polynomial (a parabola) will intrinsically have some bias error because a line cannot match a polynomial at all points.

2) Variance is the average error compared to expected prediction (Var = E[(y-E[y])^2]).
For example, if you only have 2 two sample data points, the function class of all 1st degree polynomials (ax+b) containing those two points will have no variance because only one line can go through the two points. However, the function class of all 2nd degree polynomials (ax2+bx+c) will have higher variance because there are infinite parab

When you are trying to find the correct equation to model and predict financial data, you will always have some error. If you are using regression to predict the next period’s return, you will probably measure the accuracy by mean squared error (MSE).

Error can be broken down into two components, and these two components can be interpreted as the sources of the error. Error = Bias + Variance

1) Bias is the error incurred by the expected prediction relative to the optimal/true prediction (Bias = E[y]-f(x), where f(x) is the true prediction and y is the approximation).
For example, using a 1st degree polynomial (a line) to approximate a 2nd degree polynomial (a parabola) will intrinsically have some bias error because a line cannot match a polynomial at all points.

2) Variance is the average error compared to expected prediction (Var = E[(y-E[y])^2]).
For example, if you only have 2 two sample data points, the function class of all 1st degree polynomials (ax+b) containing those two points will have no variance because only one line can go through the two points. However, the function class of all 2nd degree polynomials (ax2+bx+c) will have higher variance because there are infinite parabolas that can be strung through two points. Therefore you will have higher generalization error when you test on out-of-sample data. Here’s a picture of both examples, focus on the 1st order and 50th order, clearly both will have high prediction error:  