Abstract

Federated Learning (FL) has shown great potential as a privacy-preserving solution to learning from decentralized data that are only accessible to end devices (i.e., clients). The data locality constraint offers strong privacy protection but also makes FL sensitive to the condition of local data. Apart from statistical heterogeneity, a large proportion of the clients, in many scenarios, are probably in possession of low-quality data that are biased, noisy or even irrelevant. As a result, they could significantly slow down the convergence of the global model we aim to build and also compromise its quality. In light of this, we first present a new view of local data by looking into the representation space and observing that they converge in distribution to Normal distributions before activation. We provide theoretical analysis to support our finding. Further, we propose <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">FedProf</small> , a novel algorithm for optimizing FL over non-IID data of mixed quality. The key of our approach is a distributional representation profiling and matching scheme that uses the global model to dynamically profile data representations and allows for low-cost, lightweight representation matching. Using the scheme we sample clients adaptively in FL to mitigate the impact of low-quality data on the training process. We evaluated our solution with extensive experiments on different tasks and data conditions under various FL settings. The results demonstrate that the selective behavior of our algorithm leads to a significant reduction in the number of communication rounds and the amount of time (up to 2.4× speedup) for the global model to converge and also provides accuracy gain.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call