Federated learning (FL) can tackle the problem of data silos of asymmetric information and privacy leakage; however, it still has shortcomings, such as data heterogeneity, high communication cost and uneven distribution of performance. To overcome these issues and achieve parameter optimization of FL on non-Independent Identically Distributed (non-IID) data, a multi-objective FL parameter optimization method based on hierarchical clustering and the third-generation non-dominated sorted genetic algorithm III (NSGA-III) algorithm is proposed, which aims to simultaneously minimize the global model error rate, global model accuracy distribution variance and communication cost. The introduction of a hierarchical clustering algorithm on non-IID data can accelerate convergence so that FL can employ an evolutionary algorithm with a low FL client participation ratio, reducing the overall communication cost of the NSGA-III algorithm. Meanwhile, the NSGA-III algorithm, with fast greedy initialization and a strategy of discarding low-quality individuals (named NSGA-III-FD), is proposed to improve the convergence efficiency and the quality of Pareto-optimal solutions. Under two non-IID data settings, the CNN experiments on both MNIST and CIFAR-10 datasets show that our approach can obtain better Pareto-optimal solutions than classical evolutionary algorithms, and the selected solutions with an optimized model can achieve better multi-objective equilibrium than the standard federated averaging (FedAvg) algorithm and the Clustering-based FedAvg algorithm.
Read full abstract