Memory and Communication Efficient Federated Kernel k-Means.

Xudong Wang,Xiaochen Zhou

doi:10.1109/tnnls.2022.3213777

Abstract

A federated kernel k -means (FedKKM) algorithm is developed in this article to conduct distributed clustering with low memory consumption on user devices. In FedKKM, a federated eigenvector approximation (FEA) algorithm is designed to iteratively determine the low-dimensional approximate vectors of the transformed feature vectors, using only low-dimensional random feature vectors. To maintain high communication efficiency in each iteration of FEA, a communication-efficient Lanczos algorithm (CELA) is further designed in FEA to reduce the communication cost. Based on the low-dimensional approximate vectors, the clustering result is obtained by leveraging a distributed linear k -means algorithm. A theoretical analysis shows that: 1) FEA has a convergence rate of O(1/T) , where T is the number of iterations; 2) the scalability of FedKKM is not affected by the dataset size since the communication cost of FedKKM is independent of the number of users' data; and 3) FedKKM is a (1+ϵ) approximation algorithm. The experimental results show that FedKKM achieves the comparable clustering quality to that of a centralized kernel k -means. Compared with state-of-the-art schemes, FedKKM reduces the memory consumption on user devices by up to 94% and also reduces the communication cost by more than 40%.

Full Text