Effective Outlier Detection based on Bayesian Network and Proximity

Sha Lu,Jiuyong Li,Lin Liu,Thuc Duy Le

doi:10.1109/bigdata.2018.8622230

Abstract

Outliers are objects that are significantly different from the others in the same dataset. They often contain insightful information for understanding the data and the data generation process. Traditional outlier detection methods can generally be divided into two categories: model-based and proximity-based approaches. A new type of model-based approach has recently been proposed to use the Bayesian network (BN) framework to discover more meaningful outliers with better interpretability. They yield very good detection result when anomalousness is mainly due to the violation of the dependency among variables. However, when anomalousness is caused by reasons other than dependency violation, BN-based methods produce very poor performance. To address this problem, we propose an ensemble outlier detection method that combines BN-based and proximity-based techniques to achieve more stable outlier detection results in different scenarios. To our best knowledge, the proposed method is the first to bring together the two major categories of outlier detection techniques. Comprehensive experiments have been done on both synthetic and real world datasets, and the results show that our method outperforms the baseline methods in most cases.

Full Text