Abstract

With the growth of data and necessity for distributed optimization methods, solvers that work well on a single machine must be re-designed to leverage distributed computation. Recent work in this area has been limited by focusing heavily on developing highly specific methods for the distributed environment. These special-purpose methods are often unable to fully leverage the competitive performance of their well-tuned and customized single machine counterparts. Further, they are unable to easily integrate improvements that continue to be made to single machine methods. To this end, we present a framework for distributed optimization that both allows the flexibility of arbitrary solvers to be used on each (single) machine locally and yet maintains competitive performance against other state-of-the-art special-purpose distributed methods. We give strong primal–dual convergence rate guarantees for our framework that hold for arbitrary local solvers. We demonstrate the impact of local solver selection both theoretically and in an extensive experimental comparison. Finally, we provide thorough implementation details for our framework, highlighting areas for practical performance gains.

Highlights

  • Background and problem formulationTo provide context for our framework, we first state traditional complexity measures and convergence rates for single machine algorithms, and demonstrate that these must be adapted to more accurately represent the performance of an algorithm in the distributed setting.When running an iterative optimization algorithm A on a single machine, its performance is typically measured by the total runtime: TIME(A) = IA(ε) × TA. (T-A)Here, TA stands for the time it takes to perform a single iteration of algorithm A, and IA(ε) is the number of iterations A needs to attain an ε-accurate objective.1On a single machine, most of the state-of-the-art first-order optimization methods can achieve quick convergence in practice in terms of (T-A) by performing a large amount of relatively fast iterations

  • We review a number of methods designed to solve optimization problems of the form of interest here, which are typically referred to as regularized empirical risk minimization (ERM) problems in the machine learning literature

  • The known convergence rates for alternating direction method of multipliers (ADMM) are weaker than the more problem-tailored methods mentioned we study here, and the choice of the penalty parameter is often unclear in practice

Read more

Summary

Introduction

To provide context for our framework, we first state traditional complexity measures and convergence rates for single machine algorithms, and demonstrate that these must be adapted to more accurately represent the performance of an algorithm in the distributed setting. When running an iterative optimization algorithm A on a single machine, its performance is typically measured by the total runtime: TIME(A) = IA(ε) × TA. Most of the state-of-the-art first-order optimization methods can achieve quick convergence in practice in terms of (T-A) by performing a large amount of relatively fast iterations. Time to communicate between two machines can be several orders of magnitude slower than even a single iteration of such an algorithm. Distributed timing can be more accurately illustrated using the following practical distributed efficiency model (see [37]), where

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.