Attribute-weighted outlier detection for mixed data based on parallel mutual information

Junli Li,Zhanfeng Liu

doi:10.1016/j.eswa.2023.121304

Abstract

Outlier detection plays an important role in data mining because it can improve the performance of data analysis. Most outlier detection algorithms focus on numerical or categorical attributes; however, data typically have a mixture of numerical and categorical attributes. We addressed this problem by developing an attribute-weighted outlier detection algorithm, PMIOD, for high-dimensional and massive mixed data. The PMIOD algorithm adopts mutual information to measure attribute correlations and provides an attribute-weighting method for mixed data. Based on this, an attribute-weighted outlier detection method for mixed data was developed. Moreover, to improve the efficiency of mutual information computing for high-dimensional mixed data, the mutual information computing was parallelized on the Spark platform. We evaluated the proposed algorithm using ten UCI datasets and four synthetic datasets in comparison with widely used algorithms. Experiments were conducted to demonstrate the superiority of the results produced by the proposed algorithm.

Full Text