The seeding algorithms for spherical k-means clustering

Min Li,Dongmei Zhang,Dachuan Xu,Juan Zou

doi:10.1007/s10898-019-00779-w

Abstract

In order to cluster the textual data with high dimension in modern data analysis, the spherical k-means clustering is presented. It aims to partition the given points with unit length into k sets so as to minimize the within-cluster sum of cosine dissimilarity. In this paper, we mainly study seeding algorithms for spherical k-means clustering, for its special case (with separable sets), as well as for its generalized problem ($$\alpha $$-spherical k-means clustering). About the spherical k-means clustering with separable sets, an approximate algorithm with a constant factor is presented. Moreover, it can be generalized to the $$\alpha $$-spherical separable k-means clustering. By slickly constructing a useful function, we also show that the famous seeding algorithms such as k-means++ and k-means|| for k-means problem can be applied directly to solve the $$\alpha $$-spherical k-means clustering. Except for theoretical analysis, the numerical experiment is also included.

Full Text