By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics in sports industry
    Here’s How Data Analytics In Sports Is Changing The Game
    6 Min Read
    data analytics on nursing career
    Advances in Data Analytics Are Rapidly Transforming Nursing
    8 Min Read
    data analytics reveals the benefits of MBA
    Data Analytics Technology Proves Benefits of an MBA
    9 Min Read
    data-driven image seo
    Data Analytics Helps Marketers Substantially Boost Image SEO
    8 Min Read
    construction analytics
    5 Benefits of Analytics to Manage Commercial Construction
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: How to Cheat with Data Mining
Share
Notification Show More
Latest News
big data mac performance
Data-Driven Tips to Optimize the Speed of Macs
News
3 Ways AI Has Helped Marketers and Creative Professionals Streamline Workflows
3 Ways AI Has Helped Marketers and Creative Professionals Streamline Workflows
Artificial Intelligence
data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > How to Cheat with Data Mining
Data Mining

How to Cheat with Data Mining

SandroSaitta
Last updated: 2011/08/06 at 1:45 PM
SandroSaitta
2 Min Read
SHARE

Usually data miners don’t cheat. The reason is simple: you cannot cheat with the future. In reality, it’s a bit more complicated. A data miner may be cheating without knowing it. Here are a few examples:

First, one may cheat by learning the training set by heart. If you cheat (in any way) on your training set, it will certainly be visible on the test set (overfitting).

Usually data miners don’t cheat. The reason is simple: you cannot cheat with the future. In reality, it’s a bit more complicated. A data miner may be cheating without knowing it. Here are a few examples:

More Read

data mining helps with offsite SEO

Can Data Mining Aid with Off-Page SEO Strategies?

3 Data Mining Tips for Companies Trying to Understand their Customers
5 Data Mining Tips to Leverage the Benefits of Surveys
Perform Data Mining With Web Scrapers to Track Prices
Data Mining Vital Statistics Yields Fascinating Societal Insights

First, one may cheat by learning the training set by heart. If you cheat (in any way) on your training set, it will certainly be visible on the test set (overfitting).

Another way of cheating is to use predictor variables that you don’t have the right to use. For example, you don’t have the right to use the value of the euro of tomorrow to predict the value of the dollar of tomorrow). Straightforward, but this may happen in more subtle ways, believe me.

The next one is much more insidious. You use past data to predict the evolution of some given stocks. You train and test our model on past years (backtesting). For that, a set of stocks are choosen (as of end 2010), that were also present in 2001. This way, backtest can be performed on 10 years. Results are good over the last 10 years, so the strategy can be launched.

But you are cheating, in a subtle way. You are not overfitting and neither are you using forbidden predictor variables. So, what’s the problem? You are using non-allowed information. While backtesting, will should not know in 2001 which companies will still be there in the end of 2010. By doing so, only successful companies are selected and thus you cheat on your predicted return. Subtle isn’t it?

SandroSaitta August 6, 2011
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

big data mac performance
Data-Driven Tips to Optimize the Speed of Macs
News
3 Ways AI Has Helped Marketers and Creative Professionals Streamline Workflows
3 Ways AI Has Helped Marketers and Creative Professionals Streamline Workflows
Artificial Intelligence
data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

data mining helps with offsite SEO
Data Mining

Can Data Mining Aid with Off-Page SEO Strategies?

10 Min Read
using data mining to learn more about customers
Big Data

3 Data Mining Tips for Companies Trying to Understand their Customers

6 Min Read
surveys data
Data Mining

5 Data Mining Tips to Leverage the Benefits of Surveys

11 Min Read
data mining is game changer for small businesses
Data Mining

Perform Data Mining With Web Scrapers to Track Prices

7 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?