By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    construction analytics
    5 Benefits of Analytics to Manage Commercial Construction
    5 Min Read
    benefits of data analytics for financial industry
    Fascinating Changes Data Analytics Brings to Finance
    7 Min Read
    analyzing big data for its quality and value
    Use this Strategic Approach to Maximize Your Data’s Value
    6 Min Read
    data-driven seo for product pages
    6 Tips for Using Data Analytics for Product Page SEO
    11 Min Read
    big data analytics in business
    5 Ways to Utilize Data Analytics to Grow Your Business
    6 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Overfitting II: Out-of-Sample Testing
Share
Notification Show More
Latest News
cloud-centric companies using network relocation
Cloud-Centric Companies Discover Benefits & Pitfalls of Network Relocation
Cloud Computing
construction analytics
5 Benefits of Analytics to Manage Commercial Construction
Analytics
database compliance guide
Four Strategies For Effective Database Compliance
Data Management
Digital Security From Weaponized AI
Fortifying Enterprise Digital Security Against Hackers Weaponizing AI
Security
DevOps on cloud
Optimizing Cost with DevOps on the Cloud
Development
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Predictive Analytics > Overfitting II: Out-of-Sample Testing
Predictive Analytics

Overfitting II: Out-of-Sample Testing

Editor SDC
Last updated: 2009/03/01 at 11:03 PM
Editor SDC
8 Min Read
SHARE
- Advertisement -

Previously I wrote a note on overfitting during training. Now after reading that, let’s imagine a normal scenario-

You’re trying to find a strategy with an edge and you’re considering a 3 types: a moving average crossover momentum strategy, an RSI-threshold strategy, and a buy-after-gap-down strategy. Being a modern quant trader, you know that regular, automatic parameter optimization is the only way to make an adaptive, fully automated system. The goal of system development, of course, is to determine which strategy is best.

- Advertisement -

After reading the previous note on overfitting you’re smart enough to have split your data into two sets, one for training and one for testing.

The training set is used with crossvalidation to find the best parameters for the strategy. You are [separately] having it automatically optimize the two moving average lengths, the RSI period, and the minimum downward gap threshold. Those are the obvious parameters. Then the out-of-sample test set is used to measure the performance of each strategy, generating PnL, max drawdown, sharpe etc.

More Read

predictive analytics helps Albanian bitcoin investors

Albanian Bitcoin Investors Tap the Power of Predictive Analytics

Predictive Analytics Improves Trading Decisions as Euro Rebounds
Can Predictive Analytics Help Traders Navigate Bitcoin’s Volatility?
Perks of Predictive Analytics for Businesses Big and Small
How can CIOs Build Business Value with Business Analytics?

Following this, you compare the results and based on the PnL curve and careful scrutiny, you pick the best system.

What was the probl…


Previously I wrote a note on overfitting during training. Now after reading that, let’s imagine a normal scenario-

- Advertisement -

You’re trying to find a strategy with an edge and you’re considering a 3 types: a moving average crossover momentum strategy, an RSI-threshold strategy, and a buy-after-gap-down strategy. Being a modern quant trader, you know that regular, automatic parameter optimization is the only way to make an adaptive, fully automated system. The goal of system development, of course, is to determine which strategy is best.

After reading the previous note on overfitting you’re smart enough to have split your data into two sets, one for training and one for testing.

The training set is used with crossvalidation to find the best parameters for the strategy. You are [separately] having it automatically optimize the two moving average lengths, the RSI period, and the minimum downward gap threshold. Those are the obvious parameters. Then the out-of-sample test set is used to measure the performance of each strategy, generating PnL, max drawdown, sharpe etc.

Following this, you compare the results and based on the PnL curve and careful scrutiny, you pick the best system.

What was the problem in the above? Considering three strategies introduced a hidden parameter that slipped past crossvalidation. Go back and imagine a bigger system that has a portfolio of strategies, MA, RSI, and gap-based. These are numbered 1,2,3. So this system has an extra parameter s={1,2,3}. It also has the parameters for each strategy as mentioned above. When this system reaches the crossvalidation loop, 1 final result pops out. Previously we had 3 results and then we chose the best.

- Advertisement -

This is equivalent to overfitting on the training data. Convince yourself of this fact. They appear different because of the different purposes/names we have assigned the ‘training’ and ‘test’ sets. In fact, picking a model at the end was equivalent to training. Now generalize how we showed the equivalence of overfitting on the training and test sets to cases where the system follows a more complex adaptive strategy, with layers on layers of auto-optimization validation loops.

Test-set overfitting is typically worse than the above because in most cases you will be considering more than 3 strategies. First example: you are haphazardly searching for some edge by trying any kind of strategy you can imagine. Second example (more insidious): you are testing different kernels on an SVM. You will think that you have found that one kernel is more applicable to the domain of financial forcasting, but actually it’s an illusion. Ignore ‘intrinsic’ meaning and just conceptualize any options as a parameter list (unfortunately combinatorally large).

—
This part is just me thinking of ideas and writing. It’s a bit off the deep end: you should stop here unless the top part sounded like old news and was 100% intuitive on the first read-through. —

Hypothetically speaking, if the system had been trained and tested on an infinite amount of data overfitting would not be a problem (as long as the number of parameters is finite (??)). And I don’t mean including all time periods (ex. take every other period- still infinite but not including all time and overfitting would not be a problem). Unless you test on all the data that happens in the future, and not just your out-of-sample set (obviously impossible), you risk fitting the expression of noise that is specific to that set. You will think you have found a pattern in the stock market, when really you have found a pattern in the noise. All finite sets of numbers have patterns, for example the list of all the numbers repeated once. If this is the only pattern, and no sequence repeats more than once, then you will not suffer from too much overfitting even if you follow a flawed procedure as described above. The noise will only truly become noisy once it is infinitely long and there are no more persistent patterns. ‘Until that point’ it will not be perfect noise and you must beware around it.

When you test on anything less than infinite data, you risk selecting the fateful subset of the data that your system happens to predict perfectly. Fortunately your odds of selecting a highly patterned set from the noise decrease exponentially as you use a larger test set ( 1 / k^n ). Just remember that the possibility exists in the universe that this was all by chance. [Maybe the laws of physics are false and actually every human observation till now has simply happened be perfectly correlated with some perfectly meaningless, unrelated formulas Newton happened upon.]
——

- Advertisement -

If you can’t recognize all incarnations of overfitting, you will not be able to accurately test a self-adapting system. You can’t even get to the point of looking for an edge of this type because you don’t know how to see.

I would like to see research going more in depth on overfitting, beyond what I’ve mentioned so please leave a comment if you know of a source.

Editor SDC March 1, 2009
Share this Article
Facebook Twitter Pinterest LinkedIn
Share
- Advertisement -

Follow us on Facebook

Latest News

cloud-centric companies using network relocation
Cloud-Centric Companies Discover Benefits & Pitfalls of Network Relocation
Cloud Computing
construction analytics
5 Benefits of Analytics to Manage Commercial Construction
Analytics
database compliance guide
Four Strategies For Effective Database Compliance
Data Management
Digital Security From Weaponized AI
Fortifying Enterprise Digital Security Against Hackers Weaponizing AI
Security

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

predictive analytics helps Albanian bitcoin investors
Blockchain

Albanian Bitcoin Investors Tap the Power of Predictive Analytics

9 Min Read
benefits of data analytics for financial management
Predictive Analytics

Predictive Analytics Improves Trading Decisions as Euro Rebounds

10 Min Read
predictive analytics can help bitcoin traders predict future price movements
Blockchain

Can Predictive Analytics Help Traders Navigate Bitcoin’s Volatility?

8 Min Read
predictive analytics
Predictive Analytics

Perks of Predictive Analytics for Businesses Big and Small

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive
ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?