Abstract
AbstractChoice of initial centroids has a major impact on the performance and accuracy of k-means algorithm to group the data objects into various clusters. In basic k-means, pure arbitrary choice of initial centroids lead to construction of different clusters in every run and consequently affects the performance and accuracy of it. To date, several attempts have been made by the researchers to increase the performance and accuracy of it. However, scope of improvement still exists in this area. Therefore, a new approach to initialize centroids for k-means is proposed in this paper on the basis of the concept to choose the well separated data-objects as initial cluster centroids instead of pure arbitrary selection. As a consequence, it leads to higher probability of closeness of the chosen centroids to the final cluster centroids. The proposed algorithm is empirically assessed on 6 different well-known datasets. The results confirms that the proposed approach is considerably better than the pure arbitrary selection of centroids.KeywordsData miningk-means algorithmCluster initializationClusteringCluster validation
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have