Abstract

As Artificial Intelligence of Things (AIoT) has become increasingly important for modern AI applications, federated learning (FL) is envisioned to be the enabling technology for AIoT, especially for large-scale, data privacy-preserving scenarios. However, most existing FL is managed in a centralized manner (CFL), which confronts the limitations of scalability given the AioT device explosion. The key challenge faced by CFL is the communication bottleneck at the central model aggregation server, which leads to a high server-to-worker communication delay and thus severely slows down the model convergence. To address this challenge, this article introduces a generic decentralized FL (DFL) framework that can operate in either synchronous (Sync-DFL) mode or asynchronous (Async-DFL) mode to alleviate the high communication congestion around the central server. Moreover, Async-DFL is the first DFL in the literature to provide a generic FL framework that is fully asynchronous and able to completely avoid worker waiting, which leads to robust distributed model training in the inherently heterogeneous IoT environments, where stragglers (i.e., slow devices) are very common due to the largely varying computing/networking speeds of IoT devices. Our DFL framework is implemented, deployed, and experimented with in both simulation and physical testbeds. The results show that Async-DFL can accelerate the convergence speed of model training twice as fast as CFL, while maintaining convergence accuracy and effectively combating the impact of the stragglers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call