Abstract

Traditional outlier detection methods create a model for data and then label as outliers for objects that deviate significantly from this model. However, when dat has many outliers, outliers also pollute the model. The model then becomes unreliable, thus rendering most outlier detectors to become ineffective. To solve this problem, we propose a mean-shift outlier detector. This detector employs a mean-shift technique to modify data and cancel the bias caused by the outliers. The mean-shift technique replaces every object by the mean of its k-nearest neighbors which essentially removes the effect of outliers before clustering without the need to know the outliers. In addition, it also detects outliers based on the distance shifted. Our experiments show that the proposed method works well regardless of the number of outliers in the data. This method outperforms all state-of-the-art methods tested, with both real-world numeric datasets as well as generated numeric and string datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call