A new density-based subspace selection method using mutual information for high dimensional outlier detection

Mahboobeh Riahi-Madvar,Ahmad Akbari Azirani,Babak Nasersharif,Bijan Raahemi

doi:10.1016/j.knosys.2020.106733

Abstract

Outlier detection in high dimensional data faces the challenge of curse of dimensionality, where irrelevant features may prevent detection of outliers. In this research, we propose a novel efficient unsupervised density-based subspace selection for outlier detection in the projected subspace. First, the Maximum-Relevance-to-Density algorithm(MRD) is proposed to select the relevant subspace based on the mutual information. Then, applying the concept of redundancy among features, we present an efficient relevant subspace selection method called minimum-Redundancy-Maximum-Relevance-to-Density (mRMRD). Finally, the degree of outlierness of data points in the corresponding relevant subspace is computed based on Local Outlier Factor(LOF). Experimental results on both real and synthetic data demonstrate that the proposed algorithms – based on MRD and mRMRD criteria – increase the accuracy of outlier detection while reducing computational complexity and execution time. Moreover, as the dimensionality increases, the accuracy of outlier detection on mRMRD-based relevant subspace is higher than MRD-based relevant subspace. This verifies that the proposed mRMRD-based subspace selection algorithm can efficiently select the subspace by considering the relevance between features.

Full Text