Development of new seed with modified validity measures for k-means clustering

S Manochandar,M Punniyamoorthy,R.K Jeyachitra

doi:10.1016/j.cie.2020.106290

Abstract

Conventional k-means clustering is the widely used partitional method, mainly adapted to machine learning and pattern recognition problems. This algorithm is highly sensitive to initial centroid points, but it cannot guarantee to arrive at a better solution because initial centroids are computed randomly for the given cluster. In this paper, we have developed a new initialization method for k-means clustering. We have also made an effort to improve the Dunn Index and introduced a new validity ratio based on the silhouette index. The sum of squared error, Dunn Index, silhouette index, modified Dunn Index, and silhouette validity ratio were used as criteria to evaluate the performance of the initialization algorithm. Various benchmark datasets have been used to assess the effectiveness of the proposed initialization algorithm, and we compared the results with conventional k-means and k-means++ algorithms. The results have shown that the sum of squared error and number of iterations obtained by our proposed initialization algorithm are minimum. A precision chart is used to test the consistency of the initialization algorithm. The comparative analysis, based on the modified Dunn Index, and silhouette validity ratio have proved that the proposed initialization algorithm has performed better than the other initialization algorithms.

Full Text