Knowledge Selection and Local Updating Optimization for Federated Knowledge Distillation With Heterogeneous Models

Dong Wang,Xu Chen,Meixia Tao,Naifu Zhang

doi:10.1109/jstsp.2022.3223526

Abstract

Federated learning (FL) is a promising distributed learning paradigm in which multiple edge devices (EDs) collaborate to train a shared model without exchanging privacy-sensitive raw data. With the heterogeneous architectures of local models in FL, knowledge distillation (KD) offers a new way to deal with model heterogeneity by aggregating knowledge instead of model parameters. Usually, there is no pre-trained teacher model in the FL setup, to apply KD in FL, it is critical to design efficient knowledge aggregation mechanism. To this end, in this paper, we first analyze the relationship between each local model convergence rate and the knowledge generated by selecting predicted logits. Then, an optimization problem based on this relationship is formulated to schedule the predicted logits for efficient knowledge aggregation. An iterative algorithm called predicted logits selection (PLS) is designed to solve this problem. After that, we propose a threshold-based technique to optimize the local model updating options with/without KD for each ED in order to limit the performance degradation of local models caused by misleading knowledge. Meanwhile, we optimize both the threshold selection and the reasonable distillation intensity during the FL process. Extensive experiments based on convolutional neural network (CNN) models and real-world datasets demonstrate the superior performance gain of the proposed approach over existing benchmark methods.

Full Text