Abstract

As a classical clustering algorithm, K-means has been widely applied due to its features of simple mathematical thinking, fast convergence rate, less complexity, and easy to implementation. However, K-means algorithm always requires users to set the desired number of clusters in advance, and the initial cluster centers are usually generated in a random way. When dealing with unknown datasets that users do not have enough domain-assisted knowledge, such parameters setting strategies not only increases the burden on users, but also makes clustering quality difficult to guarantee. Therefore, in view of the high sensitivity of K-means clustering process to initial parameters, this paper propose an improved DDWK-means (Distance-Density-Weight K-means) algorithm. Based on the distance-density feature and the method of inertia weight of particle swarm optimization algorithm, the optimal initial cluster centers not only can be determined adaptively according to the structural characteristics of the dataset itself without introducing artificial parameters, but also can be adjusted dynamically due to the threshold change of clustering quality metric. We make an experimental study with five standard datasets from UCI (University of California Irvine), and the results indicate that the DDWK-means algorithm exhibits a significantly improvement in clustering efficiency and stability.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call