Density Peaks Clustering Based on Feature Reduction and Quasi-Monte Carlo

Zhihui Hu,Xiaoxu Han,Guang Kou,Yefei Bai,Xueyi Liu,Xiaoran Wei,Haoyu Zhang

doi:10.1155/2022/8046620

Abstract

Density peaks clustering (DPC) is a well-known density-based clustering algorithm that can deal with nonspherical clusters well. However, DPC has high computational complexity and space complexity in calculating local density ρ and distance δ , which makes it suitable only for small-scale data sets. In addition, for clustering high-dimensional data, the performance of DPC still needs to be improved. High-dimensional data not only make the data distribution more complex but also lead to more computational overheads. To address the above issues, we propose an improved density peaks clustering algorithm, which combines feature reduction and data sampling strategy. Specifically, features of the high-dimensional data are automatically extracted by principal component analysis (PCA), auto-encoder (AE), and t-distributed stochastic neighbor embedding (t-SNE). Next, in order to reduce the computational overhead, we propose a novel data sampling method for the low-dimensional feature data. Firstly, the data distribution in the low-dimensional feature space is estimated by the Quasi-Monte Carlo (QMC) sequence with low-discrepancy characteristics. Then, the representative QMC points are selected according to their cell densities. Next, the selected QMC points are used to calculate ρ and δ instead of the original data points. In general, the number of the selected QMC points is much smaller than that of the initial data set. Finally, a two-stage classification strategy based on the QMC points clustering results is proposed to classify the original data set. Compared with current works, our proposed algorithm can reduce the computational complexity from O n 2 to O N n , where N denotes the number of selected QMC points and n is the size of original data set, typically N ≪ n . Experimental results demonstrate that the proposed algorithm can effectively reduce the computational overhead and improve the model performance.

Highlights

With the advent of the era of big data, the importance of data mining is increasingly prominent [1]
Normalized mutual information (NMI) quantifies the similarity between the predicted labels and the true labels, which measures the robustness of the algorithm
Since there are no real labels for the unlabeled data sets, the evaluation criteria Acc, F-measure, NMI, and Adjusted rand index (ARI) cannot be applied to the unlabeled data sets

Summary

Introduction

With the advent of the era of big data, the importance of data mining is increasingly prominent [1]. A new density peaks clustering (DPC) algorithm has been proposed [5]. It is a typical density-based clustering algorithm with excellent advantages. (2) e distance between the cluster centers and other data points with a higher density is relatively large. The cluster centers are data points with high local density and high distance, which are called density peaks. Another advantage is that DPC can deal with clusters of arbitrary shape and does not need to determine the number of categories in advance

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Programming	Publication Date: Jan 6, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Density Peaks Clustering Based on Feature Reduction and Quasi-Monte Carlo

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming

Lead the way for us

Similar Papers

On the Choice of Design Points for Least Square Polynomial Approximations with Application to Uncertainty Quantification
Zhen Gao ... Tao Zhou
Communications in Computational Physics | VOL. 16
Zhen Gao, et. al.Zhen Gao ... Tao Zhou
01 Aug 2014
Communications in Computational Physics | VOL. 16

On Dropping the First Sobol’ Point
Art B Owen
-
Art B OwenArt B Owen
01 Jan 2021
01 Jan 2021

Effective Density Peaks Clustering Algorithm Based on the Layered K-Nearest Neighbors and Subcluster Merging
Chunhua Ren ... Linfu Sun
IEEE Access | VOL. 8
Chunhua Ren, et. al.Chunhua Ren ... Linfu Sun
01 Jan 2020
IEEE Access | VOL. 8

An improved density peaks clustering algorithm based on natural neighbor with a merging strategy
Shifei Ding ... Chao Li
Information Sciences | VOL. 624
Shifei Ding, et. al.Shifei Ding ... Chao Li
30 Dec 2022
Information Sciences | VOL. 624

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Density Peaks Clustering Based on Feature Reduction and Quasi-Monte Carlo

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming