Abstract

We consider the distributed learning problem where a network of n agents seeks to minimize a global function F. Agents have access to F through noisy gradients, and they can locally communicate with their neighbors over an undirected network. We study the Decentralized Local SGD method, where agents perform a number of local gradient steps and occasionally exchange information with their neighbors. Previous algorithmic analysis efforts have focused on the specific network topology (star topology), where a leader node aggregates all agents’ information. We generalize that setting to an arbitrary undirected network by analyzing the trade-off between the number of communication rounds and the computational effort of each agent. We bound the expected optimality gap in terms of the number of iterates T, the number of workers n, and the spectral gap of the underlying network. Our main results show that by using only R = Ω(n) communication rounds, one can achieve an error that scales as O(1/nT), where the number of communication rounds is independent of T and only depends on the number of agents.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call