Abstract

Distributed optimization is widely used to solve large-scale optimization problems by parallelizing gradient-based algorithms across multiple computing nodes. In asynchronous optimization, the optimization parameter is updated using stale gradients, which are gradients computed with respect to outdated parameters. Although large degrees of staleness can slow convergence, little is known about the impact of staleness and its relation to other system parameters. In this work, we analyze asynchronous optimization when implemented using either hub-and-spoke or shared memory architectures. We show that the process of gradient arrival to the master node is similar in nature to a renewal process. We derive the bandwidth requirement of the system. For the hub-and-spoke setup, we derive bounds on the expected gradient staleness and show its connection to other system parameters such as the number of workers, expected compute time, and communication delays. Our derivations reveal that it is possible to adjust gradient staleness by tuning certain parameters such as minibatch size or the number of workers. For the shared memory architecture, we show that the expected staleness is equivalent to the number of workers. Our derivations can be used in existing convergence analyses to express convergence rates in terms of other known system parameters. Such an expression gives further details on what factors impact convergence.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.