Abstract

Due to different settings of the parameters and random selection of initial clustering centers, the traditional K-means algorithm is not stable. Clustering validity index (CVI) is an important method for evaluating the effect of clustering results generated by clustering algorithms. However, many of the existing CVIs suffer from instability, narrow range of applications and cannot properly process datasets with non-spherical distribution and datasets with a large number of overlapping points. Aiming at these problems, the traditional K-means algorithm is firstly improved by utilizing the dynamic average distance to find the initial clustering centers rather than selecting them randomly. Then, based on the idea of dynamic average distance, a new clustering validity index, DCVI, is proposed. The new DCVI is able to deal with many kinds of datasets includes non-convex datasets and datasets with a large number of overlapping points. Thirdly, by integrating the improved K-means algorithm with the new DCVI, a new algorithm (KVOA) is designed to optimize and determine the optimal clustering number (Kopt) for a wide range of datasets. The experimental results on testing several datasets have demonstrated that the improved K-means algorithm is more accurately and stably than the traditional ones. Meanwhile, the new DCVI is compared with six commonly used CVIs. The experimental results show that our new DCVI is more accurately and stably than the other CVIs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call