Abstract

Batch normalization (BN) has been widely used for accelerating the training of deep neural networks. However, recent findings show that, in the federated learning (FL) scenarios, BN can damage the learning performance when the clients have non-i.i.d. data. While several FL schemes have been proposed to address this issue, they still suffer a significant performance loss compared to the centralized scheme. In addition, none of them have explained how the BN impacts the FL convergence analytically. In this paper, we present the first convergence analysis to show that the mismatched local and global statistical parameters due to non-i.i.d data cause gradient deviation and it leads the algorithm to converge to a biased solution with a slower rate. To remedy this, we further present a new FL algorithm, called FedTAN, based on an iterative layer-wise parameter aggregation procedure. Experiment results are presented to show the superiority of FedTAN.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call