A New Outlier Detection Method Based on Machine Learning

Yafei Lv,Xiaohan Zhang,Zhenyu Xiong,Mi Cai,Yaqi Cui,Xiangqi Gu

doi:10.1109/icsidp47821.2019.9173217

Abstract

Aiming at the practical problems of outlier detection, such as unreasonable assumptions, uncertain thresholds and repeated manual debugging, a novel method of outlier detection based on machine learning is proposed. Firstly, the outlier factor of each data object is calculated by unsupervised learning. In this paper, Isolation Forest Algorithm is adopted to calculate the outlier factor. Then, the outlier detection model is established. The problem of outlier detection is transformed into a binary classification learning model. The outlier factor threshold is learned from labeled dataset by supervised learning, and the threshold is generated by the way of gradually progressive grid searching. The proposed method with other methods is further fused by ensemble learning, which improves the detection accuracy of outlier. We conduct extensive experiments on several datasets, and show that our new outlier detection method competes with Z-score method. Our method achieves a better performance in terms of accuracy and effectiveness.

Full Text