Due to simplicity, K-means has become a widely used clustering method. However, its clustering result is seriously affected by the initial centers and the allocation strategy makes it hard to identify manifold clusters. Many improved K-means are proposed to accelerate it and improve the quality of initialize cluster centers, but few researchers pay attention to the shortcoming of K-means in discovering arbitrary-shaped clusters. Using graph distance (GD) to measure the dissimilarity between objects is a good way to solve this problem, but computing the GD is time-consuming. Inspired by the idea that granular ball uses a ball to represent the local data, we select representatives from a local neighborhood, called natural density peaks (NDPs). On the basis of NDPs, we propose a novel K-means algorithm for identifying arbitrary-shaped clusters, called NDP-Kmeans. It defines neighbor-based distance between NDPs and takes advantage of the neighbor-based distance to compute the GD between NDPs. Afterward, an improved K-means with high-quality initial centers and GD is used to cluster NDPs. Finally, each remaining object is assigned according to its representative. The experimental results show that our algorithms can not only recognize spherical clusters but also manifold clusters. Therefore, NDP-Kmeans has more advantages in detecting arbitrary-shaped clusters than other excellent algorithms.
Read full abstract