An improved k-means algorithm based on density normalization

Xiaojun Wu,Xiaochun Wang,Sheng Yuan,Jingjing Wei,Zihong Chen

doi:10.1109/iciba52610.2021.9687899

Abstract

In order to improve the the accuracy of k-means algorithm, many improved k-means algorithms by selecting the initial center point have been proposed. However, even if the initial centers is selected, the size, shape and density of the clusters in the dataset will still affect the clustering performance. An improved k-means algorithm based on density normalization method (DNK-means) is proposed which uses data transformation to solve above problem. The main idea of the algorithm is to calculate the nearest neighbor density of each point and find the point with the highest density among its nearest neighbors. The point with the highest nearest neighbor density is regarded as a candidate point. These candidate points are constructed into a minimum spanning tree to obtain initial centers. The normalized dataset is obtained by transform all points of dataset to their nearest neighbor with highest density. Finally, k-means is employed on the processed dataset to obtain the final clustering result. DNK-means algorithm is tested on some well-known datasets from UCI machine learning repository. The results show that our algorithm achieves better clustering results than the traditional k-means algorithm and latest improved k-means algorithm.

Full Text