Density Peaks Clustering Algorithm for Large-scale Data Based on Divide-and-Conquer Strategy

Yining Wang

doi:10.1109/mlbdbi54094.2021.00084

Abstract

Density peaks clustering algorithm is a simple but effective clustering method, which requires fewer parameters and iteration, and can determine the number of clusters. But this algorithm has high complexity of time and space which makes it is unsuitable to cluster large-scale data. So that this paper proposes a density peaks clustering algorithm based on the divide-and-conquer strategy. Firstly, divides the large-scale data into a series of data blocks consistent with the original data distribution, then performs density peaks clustering on a randomly selected block to get the clustering center of the data block. Since the data block has the same distribution with the original large-scale data, the clustering center can be used as the clustering center of the original data. Finally, allocates the remaining data blocks to the corresponding clustering center to obtain the final clustering result. Experimental results on real and synthetic datasets verify the effectiveness of the proposed algorithm.

Full Text