Similarity Distribution Density: An Optimized Approach to Outlier Detection

Li Quan,Tao Gong,Kaida Jiang

doi:10.3390/electronics12204227

Abstract

When dealing with uncertain data, traditional model construction methods often ignore or filter out noise data to improve model performance. However, this simple approach can lead to insufficient data utilization, model bias, reduced detection ability, and decreased robustness of detection models. Outliers can be considered as data that are inconsistent with other patterns at certain specific moments and are not always negative data, so their emergence is not always bad. In the process of data analysis, outliers play a crucial role in sample vector recognition, missing value processing, and model stability verification. In addition, unsupervised models have very high computation costs when recognizing outliers, especially non-parameterized unsupervised models. To solve the above problems, we used semi-supervised learning processes and used similarity as a negative selection criterion to propose a local density verification detection model (Vd-LOD). This model establishes similarity pseudo-labels for multi-label and multi-type samples, verifies the accuracy of outlier values based on local outlier factors, and increases the detector’s sensitivity to outliers. The experimental results show that under different parameter settings with varying outlier quantities, Vd-LOD outperforms other detection models in terms of the significant increase in average time consumption caused by verifying the presence of relationships, while also achieving an approximate 6% improvement in average detection accuracy.

Full Text