Abstract

Accelerated gradient methods have the potential of achieving optimal convergence rates and have successfully been used in many practical applications. Despite this fact, the rationale underlying these accelerated methods remain elusive. In this work, we study gradient-based accelerated optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the Bregman Lagrangian. We show that for sufficiently smooth objectives, the acceleration can be achieved by discretizing the proposed ODE using $s$ -stage $q$ -order implicit Runge-Kutta integrators. In particular, we prove that under the assumption of convexity and sufficient smoothness, the sequence of iteration generated by the proposed accelerated method stably converges to the optimal solution at a rate of ${O}\left({\left({1-\tilde {C}_{p,q} \cdot \frac {\mu }{L}}\right)^{N}N^{-p}}\right)$ , where $p \geq 2$ is the parameter in the second-order ODE and $\tilde {C}_{p,q}$ is a constant depending on $p$ and $q$ . Several numerical experiments are given to verify the convergence results.

Highlights

  • Numerous problems in machine learning [1], system identification [2] and optimal control [3]–[5] involves minimizing convex and strongly convex functions

  • It is showed that the high-resolution ordinary differential equation (ODE) combing with a general Lyapunov function framework enable the analysis of accelerated convergence rates of Nesterov’s accelerated gradient descent (NAG)

  • From (37), we have f − f (x∗) ≤ Cμ 1 − Cp,q · L. It is noted from (28) that when the number of iterations sNmtaelnledrsthtoaninNfi−nipt,y,in(d1ic−atCinpg,qtμLha)Nt is the a higher order infinity final convergence rate of the algorithm is mainly determined by According to the conclusion of convergence in Theorem 21, it can be noted that when p is fixed, h is positively correlated with q and with the increase of q, the step size h can be taken in a larger range

Read more

Summary

INTRODUCTION

Numerous problems in machine learning [1], system identification [2] and optimal control [3]–[5] involves minimizing convex and strongly convex functions. Authors of [17] further extended Nesterov’s accelerated gradient descent (NAG) to global convex and quasi-strongly convex objectives and obtained linear convergence rates. Su et al showed in [19] that the continuous limit of NAG method is a second-order ordinary differential equation (ODE) describing a physical system with vanishing friction. Li: ImRK Methods for Accelerated Unconstrained Convex Optimization the convergence rate of NAG via the discrete version of a Lyapunov function. It is showed that the high-resolution ODEs combing with a general Lyapunov function framework enable the analysis of accelerated convergence rates of NAG.

PROBLEM FORMULATION AND PRELIMINARY
ELEMENTARY DIFFERENTIALS
3: Choosing step size h
BOUNDEDNESS OF DERIVATIVES
NUMERICAL EXPERIMENTS
DIFFERENT ODES
CONCLUSION AND DISCUSSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.