Abstract

Federated distillation (FD) is a popular novel algorithmic paradigm for Federated learning (FL), which achieves training performance competitive to prior parameter averaging-based methods, while additionally allowing the clients to train different model architectures, by distilling the client predictions on an unlabeled auxiliary set of data into a student model. In this work, we propose FEDAUX, an extension to FD, which, under the same set of assumptions, drastically improves the performance by deriving maximum utility from the unlabeled auxiliary data. FEDAUX modifies the FD training procedure in two ways: First, unsupervised pre-training on the auxiliary data is performed to find a suitable model initialization for the distributed training. Second, ε, δ)-differentially private certainty scoring is used to weight the ensemble predictions on the auxiliary data according to the certainty of each client model. Experiments on large-scale convolutional neural networks (CNNs) and transformer models demonstrate that our proposed method achieves remarkable performance improvements over state-of-the-art FL methods, without adding appreciable computation, communication, or privacy cost. For instance, when training ResNet8 on non-independent identically distributed (i.i.d.) subsets of CIFAR10, FEDAUX raises the maximum achieved validation accuracy from 30.4% to 78.1%, further closing the gap to centralized training performance. Code is available at https://github.com/fedl-repo/fedaux.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call