Federated clustering (FC) is an extension of centralized clustering in federated settings. The key here is how to construct a global similarity measure without sharing private data, since the local similarity may be insufficient to group local data correctly, and the similarity of samples across clients cannot be directly measured due to privacy constraints. Obviously, the most straightforward way to analyze FC is to employ methods extended from centralized ones, such as K-means (KM) and fuzzy c-means (FCM). However, they are vulnerable to non independent-and-identically-distributed (non-IID) data among clients. To handle this, we propose a pretty simple and effective federated clustering framework instantiated with generative adversarial network (GAN), named synthetic data aided federated clustering (SDA-FC). It trains generative adversarial network locally in each client and uploads the generated synthetic data to the server, where KM or FCM is performed on the synthetic data. The synthetic data can make the model immune to the non-IID problem and enable us to capture the global similarity characteristics more effectively without sharing private data. Comprehensive experiments reveal the advantages of SDA-FC, including superior performance in addressing the non-IID problem and the device failures. The code is available at https://github.com/Jarvisyan/SDA-FC.
Read full abstract