Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    unusual trading activity
    Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
    3 Min Read
    software developer using ai
    How Data Analytics Helps Developers Deliver Better Tech Services
    8 Min Read
    ai for stock trading
    Can Data Analytics Help Investors Outperform Warren Buffett
    9 Min Read
    media monitoring
    Signals In The Noise: Using Media Monitoring To Manage Negative Publicity
    5 Min Read
    data analytics
    How Data Analytics Can Help You Construct A Financial Weather Map
    4 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: How to Cheat with Data Mining
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > How to Cheat with Data Mining
Data Mining

How to Cheat with Data Mining

SandroSaitta
SandroSaitta
2 Min Read
SHARE

Usually data miners don’t cheat. The reason is simple: you cannot cheat with the future. In reality, it’s a bit more complicated. A data miner may be cheating without knowing it. Here are a few examples:

First, one may cheat by learning the training set by heart. If you cheat (in any way) on your training set, it will certainly be visible on the test set (overfitting).

Usually data miners don’t cheat. The reason is simple: you cannot cheat with the future. In reality, it’s a bit more complicated. A data miner may be cheating without knowing it. Here are a few examples:

More Read

Images from “Contact lenses with circuits, lights a…
The Driving Force Behind Big Data: Data Connectivity
The ‘Big Data’ Buzz – Revolution or Evolution?
Physicists, models, and the credit crisis
Welcome to the Decision Support Channel for the Business…

First, one may cheat by learning the training set by heart. If you cheat (in any way) on your training set, it will certainly be visible on the test set (overfitting).

Another way of cheating is to use predictor variables that you don’t have the right to use. For example, you don’t have the right to use the value of the euro of tomorrow to predict the value of the dollar of tomorrow). Straightforward, but this may happen in more subtle ways, believe me.

The next one is much more insidious. You use past data to predict the evolution of some given stocks. You train and test our model on past years (backtesting). For that, a set of stocks are choosen (as of end 2010), that were also present in 2001. This way, backtest can be performed on 10 years. Results are good over the last 10 years, so the strategy can be launched.

But you are cheating, in a subtle way. You are not overfitting and neither are you using forbidden predictor variables. So, what’s the problem? You are using non-allowed information. While backtesting, will should not know in 2001 which companies will still be there in the end of 2010. By doing so, only successful companies are selected and thus you cheat on your predicted return. Subtle isn’t it?

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

fda14abd c869 4da5 943c c036ad8efc2e
How Data-Driven Journalists Are Using API News Apps to Improve Reporting
Big Data Exclusive News
0622cae5 f7d7 4f74 84b5 eabd1a823dca
How Data-Driven Grocery Recommendations Help Shoppers Eat Better With Less Effort
Big Data Exclusive
business recovering from data loss
How Data-Driven Businesses Protect MySQL Databases from Shutdown
Big Data Exclusive
ai driven task management
Reducing “Work About Work” with AI Task Managers
Artificial Intelligence Exclusive

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

Cloud Computing Predictions for 2009

6 Min Read

Using R in Astronomy

1 Min Read

What is that statistical speech model you are talking about?

3 Min Read

Statistics: The Need for Integration

2 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?