The past years have witnessed the success of a distributed learning system called Federated Learning (FL). Recently, asynchronous FL (AFL) has demonstrated its potential in concurrency compared to mainstream synchronous FL. However, the inherent systematic and statistical heterogeneity has presented several impediments to AFL: On the client side, the discrepancies in trips and local model drift impede global performance enhancement; On the server side, dynamic communication leads to significant fluctuations in gradient arrival time, while asynchronous arrival gradients with ambiguous value are not fully leveraged. In this paper, we propose an adaptive AFL framework, ARDAGH, which systematically addresses the aforementioned challenges: Firstly, to address the discrepancies in client trips, ARDAGH ensures their convergence by incorporating only 1-bit feedback information into the downlink. Secondly, to counter the drift of clients, ARDAGH generalizes the local models by employing our novel adversarial sharpness-aware minimization, which does not necessitate reliance on additional global variables. Thirdly, in the face of gradient latency issues, ARDAGH employs a communication-aware dropout strategy to adaptively compress gradients to ensure similar transmission times. Finally, to fully unleash the potential of each gradient, we establish a consistent optimal direction by conceptualizing the aggregation as an optimizer with successive momentum. In light of the comprehensive solution provided by ARDAGH, an algorithm named FedAMO is derived, and its superiority is confirmed by experimental results obtained under challenging prototype and simulation settings. Particularly in typical sentiment analysis tasks, FedAMO demonstrates an improvement of up to 5.351% with a 20.056-fold acceleration compared to conventional asynchronous methods.
Read full abstract