This paper proposes a new kind of k′-means algorithms for clustering analysis with three frequency sensitive (data) discrepancy metrics in the cases that the exact number of clusters in a dataset is not pre-known. That is, by setting the number k of seed-points for learning clusters to be larger than the true number k′ of actual clusters in the dataset, i.e., k>k′, these algorithms can locate the centers of k′ actual clusters by k′ converged seed-points, respectively, with the extra k-k′ seed-points corresponding to empty clusters, namely containing no winning points in the competition according to the underlying frequency sensitive discrepancy metrics. It is demonstrated by the experiments on both synthetic and real-world datasets that these three new k′-means clustering algorithms can detect the number of actual clusters in a dataset with a classification accuracy rate as high as or higher than that of the original k′-means algorithm. Moreover, they converge more quickly than the original one.
Read full abstract