Abstract
The computational efficiency of the asynchronous stochastic gradient descent (ASGD) against its synchronous version has been well documented in recent works. Unfortunately, it usually works only for the situation that all workers retrieve data from a shared dataset. As data get larger and more distributed, new ideas are urgently needed to maintain the efficiency of ASGD for decentralized training. This article proposes a novel ASGD over decentralized datasets where each worker can only access its local privacy-preserved dataset. We first observe that due to the heterogeneity of decentralized datasets and/or workers, the ASGD will progress at wrong directions, leading to undesired solutions. To tackle this issue, we propose a decentralized asynchronous stochastic gradient descent (DASGD) method by weighting the SG via the importance sampling technique. We prove that the DASGD achieves a convergence rate of O(1/K <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">\frac12</sup> ) on nonconvex training problems under mild conditions. Numerical results also substantiate the performance of the proposed algorithm.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.