Abstract
We design a new variant of the stochastic gradient descent algorithm applied for learning a global model based on the data distributed over the nodes of a network. Motivated by settings such as in decentralized learning, we suppose that one special node in the network, which we call node 1, is interested in learning the global model. We seek a decentralized and distributed algorithm for many reasons including privacy and fault-tolerance. A natural candidate here is Gossip-style SGD. However, it suffers from slow convergence and high communication cost mainly because at the end all nodes, and not only the special node, will learn the model. We propose a distributed SGD algorithm using a weighted random walk to sample the nodes. The Markov chain is designed to have stationary probability distribution that is proportional to the smoothness bound L_i of the local loss function at node i. We study the convergence rate of this algorithm and prove that it depends on the smoothness average L. This outperforms the case of uniform sampling algorithm obtained by a Metropolis-Hasting random walk (MHRW) which depends on the supremum of all L_i s noted L. We present numerical simulations that substantiate our theoretical findings and show that our algorithm outperforms random walk and gossip-style algorithms.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.