Ten ways to build a wrong scoring model

Below are some ways to build a wrong scoring model. The author doesn’t make any guarantee that if your modeling team uses one of them they will still get a correct model.

1) Over-fit the model to the sample. This over-fitting can be checked by taking a random sample again and fitting the scoring equation and comparing predicted conversion rates versus actual conversion rates. The over-fit model does not rank order: deciles with lower average probability may show equal or more conversions than deciles with higher probability scores.

2) Choose non-random samples for building and validating the scoring equation. Read over-fitting above.

3) Use Multicollinearity without business judgment to remove variables that may make business sense. This usually happens a few years after you studied — and have now forgotten — multicollinearity…

Below are some ways to build a wrong scoring model. The author doesn’t make any guarantee that if your modeling team uses one of them they will still get a correct model.

2) Choose non-random samples for building and validating the scoring equation. Read over-fitting above.

If you don’t know the difference between Multicollinearity and Heteroscedasticity, this could be the real deal-breaker for you

4) Using legacy codes for running scoring, usually with step-wise forward and backward regression. This usually happens on Fridays and when you’re in a hurry to make models.

5) Ignoring signs or magnitude of parameter estimates (that’s the output or the weightage of the variable in the equation).

6) Not knowing the difference between Type 1 and Type 2 errors, especially when rejecting variables based on P value.

7) Excessive zeal in removing variables. Why? Ask yourself this question every time you are removing a variable.

8) Using the wrong causal event (like mailings for loans) for predicting the future with scoring model (for mailings of deposit accounts). Or using the right causal event in the wrong environment (rapid decline/rise of sales due to factors not present in model like competitor entry/going out of business, oil prices, credit shocks sob sob sigh).

9) Over-fitting.

10) Learning about creating models from blogs and not reading and refreshing your old statistics textbooks.