Federated learning (FL) is an emerging distributed machine learning (ML) framework that operates under privacy and communication constraints. To mitigate the data heterogeneity underlying FL, clustered FL (CFL) was proposed to learn customized models for different client groups. However, due to the lack of effective client selection strategies, the CFL process is relatively slow, and the model performance is also limited in the presence of nonindependent and identically distributed (non-IID) client data. In this work, for the first time, we propose selecting participating clients for each cluster with active learning (AL) and call our method active client selection for CFL (ACFL). More specifically, in each ACFL round, each cluster filters out a small set of clients, which are the most informative clients according to some AL metrics e.g., uncertainty sampling, query-by-committee (QBC), loss, and aggregates only its model updates to update the cluster-specific model. We empirically evaluate our ACFL approach on the public MNIST, CIFAR-10, and LEAF synthetic datasets with class-imbalanced settings. Compared with several FL and CFL baselines, the results reveal that ACFL can dramatically speed up the learning process while requiring less client participation and significantly improving model accuracy with a relatively low communication overhead.
Read full abstract