Abstract
This paper introduces the Runge–Kutta Chebyshev descent method (RKCD) for strongly convex optimisation problems. This new algorithm is based on explicit stabilised integrators for stiff differential equations, a powerful class of numerical schemes that avoid the severe step size restriction faced by standard explicit integrators. For optimising quadratic and strongly convex functions, this paper proves that RKCD nearly achieves the optimal convergence rate of the conjugate gradient algorithm, and the suboptimality of RKCD diminishes as the condition number of the quadratic function worsens. It is established that this optimal rate is obtained also for a partitioned variant of RKCD applied to perturbations of quadratic functions. In addition, numerical experiments on general strongly convex problems show that RKCD outperforms Nesterov’s accelerated gradient descent.
Highlights
Optimisation is at the heart of many applied mathematical and statistical problems, while its beauty lies in the simplicity of describing the problem in question
Optimal rate of Comparing the two algorithms in Sect. 5 we find that Runge–Kutta Chebyshev descent method (RKCD) outperforms accelerated gradient descent (AGD), namely crkcd ≤ cagd for η ≥ η0, where η0 1.17 is a moderate size constant consider the modification3 of Algorithm 1 designed for the minimization of composite functions of the form (4.1)
We call this method the partitioned Runge–Kutta Chebyshev descent method (PRKCD) and show in Proposition 3 that it matches the rate given by the analysis of quadratic problems
Summary
Optimisation is at the heart of many applied mathematical and statistical problems, while its beauty lies in the simplicity of describing the problem in question. Inspired by [16], RKCD uses explicit stabilised methods [1,5,18] to discretise the gradient flow (1.2). For the numerical integration of stiff problems, explicit stabilised methods provide a computationally efficient alternative to the implicit Euler method for stiff differential equations, where standard integrators face a severe step size restriction, in particular for spatial discretisations of high-dimensional diffusion PDEs; see the review [2]. Discrete gradient methods were used in [8] for the integration of (1.2) and shown to have similar properties to the gradient descent for (strongly) convex objective functions. The work in [24], considers numerical discretizations of a rescaled version of the gradient flow (1.2) and shows that acceleration can be achieved when extra smoothness assumptions are imposed to the objective function f. This paper concludes with an overview of the remaining theoretical challenges
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.