DP-Kmeans and Beyond: Optimal Clustering with a new Clustering Validity Index

Zhu-Juan Ma Zhu-Juan Ma,Feng Liu Xiang-Hua Chen,Xiang-Hua Chen Zi-Han Wang,Zi-Han Wang Zhu-Juan Ma

doi:10.53106/199115992022103305001

Abstract

<p>The K-means clustering algorithm is widely used in many areas for its high efficiency. However, the performance of the traditional K-means algorithm is very sensitive to the selection of initial clustering centers. Furthermore, except the convex distributed datasets, the traditional K-means algorithm still cannot optimally process many non-convex distributed datasets and datasets with outliers. To this end, this paper proposes the DP-Kmeans, an improved K-means algorithm based on the Density Parameter and center replacement, which can be more accurate than the traditional K-means by dropping the random selection of the initial clustering centers and continuous updating of the new centers. Due to the unsupervised learning feature, the number of clusters and the quality of data partitions generated by the clustering algorithm cannot be guaranteed. In order to evaluate the results of the DP-Kmeans algorithm, this paper proposes the SII, a new clustering validity index based on the Sum of the Inner-cluster compactness and the Inter-cluster separateness. Based on the DP-Kmeans algorithm and the SII index, a new method is proposed to determine the optimal clustering numbers for different datasets. Experimental results on ten datasets with different distributions demonstrate that the proposed clustering method is more effective the existing ones. </p> <p>&nbsp;</p>

Full Text