Federated learning (FL), particularly when data is distributed across multiple clients, helps reducing the learning time by avoiding training on a massive pile-up of data. Nonetheless, low computation capacities or poor network conditions can worsen the convergence time, therefore decreasing accuracy and learning performance. In this paper, we propose a framework to deploy FL clients in a network, while compensating end-to-end time variation due to heterogeneous network setting. We present a new distributed learning control scheme, named In-network Federated Learning Control (IFLC), to support the operations of distributed federated learning functions in geographically distributed networks, and designed to mitigate the stragglers with lower deployment costs. IFLC adapts the allocation of distributed hardware accelerators to modulate the importance of local training latency in the end-to-end delay of federated learning applications, considering both deterministic and stochastic delay scenarios. By extensive simulation on realistic instances of an in-network anomaly detection application, we show that the absence of hardware accelerators can strongly impair the learning efficiency. Additionally, we show that providing hardware accelerators at only 50% of the nodes, can reduce the number of stragglers by at least 50% and up to 100% with respect to a baseline FIRST-FIT algorithm, while also lowering the deployment cost by up to 30% with respect to the case without hardware accelerators. Finally, we explore the effect of topology changes on IFLC across both hierarchical and flat topologies.
Read full abstract