# Outlier Analysis: Chebyschev Criteria vs Approach Based on Mutual Information

**792**Views

…Usually such behavior is not proficient to obtain good results, but this time I think that the change of prospective has been positive!

**As often happens, I usually do many thing in the same time, so during a break while I was working for a new post on applications of mutual information in data mining, I read the interesting paper suggested by Sandro Saitta on his blog (dataminingblog) related to the outlier detection.**

**…Usually such behavior is not proficient to obtain good results, but this time I think that the change of prospective has been positive! Chebyshev Theorem In many real scenarios (under certain conditions) the Chebyshev Theorem provides a powerful algorithm to detect outliers. The method is really easy to implement and it is based on the distance of Zeta-score values from k standard deviation. …Surfing on internet you can find several explanations and theoretical explanation of this pillar of the Descriptive Statistic, so I don’t want increase the Universe Entropy explaining once again something already available and better explained everywhere 🙂 Approach based on Mutual Information Before I explain my approach I have to say that I have not had time to check in literature if this method has been already implemented (please drop a comment if someone finds out a reference! … I don’t want take improperly credits). The aim of the method is to remove iteratively the sorted Z-Scores till the mutual information between the Z-Scores and the candidates outlier I(Z|outlier) increases. At each step the candidate outlier is the Z-score having the highest absolute value.**

** Basically, respect the Chebyschev method, there is no pre-fixed threshold.**

** Experiments I compared the two methods through canonical distribution, and at a glance it seems that results are quite good.**

Test on Normal Distribution |

**As you can see in the above experiment the Mutual information criteria seems more performant in the outlier detection.**

Test on Normal Distribution having higher variance |

** The following experiments have been done with Gamma Distribution and Negative Exponential**

Results on Gamma seem comparable. |

Experiment done using Negative Exponential distribution |

**…In the next days I’m going to test the procedure on data having multimodal distribution. Stay TunedCristian**

You must log in to post a comment.