K-means—and the celebrated Lloyd’s algorithm—is more than the clustering method it was originally designed to be. It has indeed proven pivotal to help increase the speed of many machine learning, data analysis techniques such as indexing, nearest-neighbor search and prediction, data compression and, lately, inference with kernel machines. Here, we introduce an efficient extension of K-means, dubbed QuicK-means, that rests on the idea of expressing the matrix of the \(K\) cluster centroids as a product of sparse matrices, a feat made possible by recent results devoted to find approximations of matrices as a product of sparse factors. Using such a decomposition squashes the complexity of the matrix-vector product between the factorized \(K\times D\) centroid matrix \({\mathbf {U}}\) and any vector from \({\mathcal {O}}\left( KD\right)\) to \({\mathcal {O}}\left( A \log B~ +B\right)\), with \(A=\min \left( K,D\right)\) and \(B=\max \left( K,D\right)\), where \(D\) is the dimension of the data. This drastic computational saving has a direct impact in the assignment process of a point to a cluster. We propose to learn such a factorization during the Lloyd’s training procedure. We show that resorting to a factorization step at each iteration does not impair the convergence of the optimization scheme, and demonstrate the benefits of our approach experimentally.
Read full abstract