Abstract
Due to different settings of the parameters and random selection of initial clustering centers, the traditional K-means algorithm is not stable. Clustering validity index (CVI) is an important method for evaluating the effect of clustering results generated by clustering algorithms. However, many of the existing CVIs suffer from instability, narrow range of applications and cannot properly process datasets with non-spherical distribution and datasets with a large number of overlapping points. Aiming at these problems, the traditional K-means algorithm is firstly improved by utilizing the dynamic average distance to find the initial clustering centers rather than selecting them randomly. Then, based on the idea of dynamic average distance, a new clustering validity index, DCVI, is proposed. The new DCVI is able to deal with many kinds of datasets includes non-convex datasets and datasets with a large number of overlapping points. Thirdly, by integrating the improved K-means algorithm with the new DCVI, a new algorithm (KVOA) is designed to optimize and determine the optimal clustering number (Kopt) for a wide range of datasets. The experimental results on testing several datasets have demonstrated that the improved K-means algorithm is more accurately and stably than the traditional ones. Meanwhile, the new DCVI is compared with six commonly used CVIs. The experimental results show that our new DCVI is more accurately and stably than the other CVIs.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.