A Two-Stage Clustering Algorithm Based on Improved K-Means and Density Peak Clustering

Na Xiao,Xu Zhou,Zhibang Yang,Xin Huang

doi:10.1109/icbk.2019.00047

Abstract

The density peak clustering algorithm (DPC) has been widely concerned by researchers since it was proposed. Its advantage lies in its ability to achieve efficient clustering based on two simple assumptions. In DPC, a key step is to manually select the cluster centers according to the decision graph. The quality of the decision graph determines the quality of the selected cluster centers and the quality of the clustering result. The quality of the decision graph is determined by the parameter dc. Although the authors have proposed an empirical parameter selection method, this method does not work well in many real-world datasets. Therefore, in these data sets, the user needs to repeatedly adjust the parameter multiple times to get a good decision graph. Thus, manually selecting cluster centers is not an easy task. In this paper, combined with the clustering idea of K-means and DPC, we propose a two-stage clustering algorithm KDPC that can automatically acquire the cluster centers. In the first stage, KDPC uses an improved K-means algorithm to obtain high quality cluster centers. In the second stage, KDPC clusters the remaining data points according to the clustering idea of DPC. Experiments show that KDPC can achieve good clustering effect in both artificial data sets and real-world data sets. In addition, compared with DPC, KDPC can show better clustering effect in data sets with significant difference in density of clusters.

Full Text