Abstract
The class number k is one of the key factors to influence cluster quality in K-means algorithm. Several cluster validity measures have been proposed for confirming the optimal k value. However, the existing methods may not work well for the following two kinds of data sets: the data set containing cluster groups with different densities and the data set in which the cluster groups are extremely close to each other. Therefore, a new cluster validity index was proposed. The index was defined as the ratio value between the squared total length of the data eigen-axes and the between-cluster separation (the data set containing merged cluster group). If the value reaches the minimum, the clustering number is the optimal one. At the same time, in order to reduce the sensitivity of K-means algorithm to isolation point and noise, a K-wmeans clustering algorithm based on weights was put forward to calculate clustering centers. Experimental results show that the proposed algorithm gives more accurate results than the other algorithm. A modified K-means algorithm based on a new cluster validity index not only reduces the impact of isolation point and noise but also effectively deals with the two kinds of data sets mentioned above, improving the quality of data clustering.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.