A globally convergent gradient method with momentum

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Abstract In this work, we consider smooth unconstrained optimization problems and we deal with the class of gradient methods with momentum, i.e., descent algorithms where the search direction is defined as a linear combination of the current gradient and the preceding search direction. This family of algorithms includes nonlinear conjugate gradient methods and Polyak’s heavy-ball approach, and is thus of high practical and theoretical interest in large-scale nonlinear optimization. We propose a general framework where the scalars of the linear combination defining the search direction are computed simultaneously by minimizing the approximate quadratic model in the 2 dimensional subspace. This strategy allows us to define a class of gradient methods with momentum enjoying global convergence guarantees and an optimal worst-case complexity bound in the nonconvex setting. Differently than all related works in the literature, the convergence conditions are stated in terms of the Hessian matrix of the bi-dimensional quadratic model. To the best of our knowledge, these results are novel to the literature. Moreover, extensive computational experiments show that the gradient method with momentum here presented is competitive with respect to other popular solvers for nonconvex unconstrained problems.

Similar Papers
  • Research Article
  • Cite Count Icon 940
  • 10.1137/1011036
Convergence Conditions for Ascent Methods
  • Apr 1, 1969
  • SIAM Review
  • Philip Wolfe

Convergence Conditions for Ascent Methods

  • Research Article
  • Cite Count Icon 84
  • 10.1016/j.cam.2013.04.032
A modified Polak–Ribière–Polyak conjugate gradient algorithm for nonsmooth convex programs
  • Apr 26, 2013
  • Journal of Computational and Applied Mathematics
  • Gonglin Yuan + 2 more

A modified Polak–Ribière–Polyak conjugate gradient algorithm for nonsmooth convex programs

  • Research Article
  • Cite Count Icon 38
  • 10.1080/10556788.2017.1296439
Non-asymptotic convergence analysis of inexact gradient methods for machine learning without strong convexity
  • May 31, 2017
  • Optimization Methods and Software
  • Anthony Man-Cho So + 1 more

Many recent applications in machine learning and data fitting call for the algorithmic solution of structured smooth convex optimization problems. Although the gradient descent method is a natural choice for this task, it requires exact gradient computations and hence can be inefficient when the problem size is large or the gradient is difficult to evaluate. Therefore, there has been much interest in inexact gradient methods (IGMs), in which an efficiently computable approximate gradient is used to perform the update in each iteration. Currently, non-asymptotic linear convergence results for IGMs are typically established under the assumption that the objective function is strongly convex, which is not satisfied in many applications of interest; while linear convergence results that do not require the strong convexity assumption are usually asymptotic in nature. In this paper, we combine the best of these two types of results by developing a framework for analysing the non-asymptotic convergence rates of IGMs when they are applied to a class of structured convex optimization problems that includes least squares regression and logistic regression. We then demonstrate the power of our framework by proving, in a unified manner, new linear convergence results for three recently proposed algorithms—the incremental gradient method with increasing sample size [R.H. Byrd, G.M. Chin, J. Nocedal, and Y. Wu, Sample size selection in optimization methods for machine learning, Math. Program. Ser. B 134 (2012), pp. 127–155; M.P. Friedlander and M. Schmidt, Hybrid deterministic–stochastic methods for data fitting, SIAM J. Sci. Comput. 34 (2012), pp. A1380–A1405], the stochastic variance-reduced gradient (SVRG) method [R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems 26: Proceedings of the 2013 Conference, 2013, pp. 315–323], and the incremental aggregated gradient (IAG) method [D. Blatt, A.O. Hero, and H. Gauchman, A convergent incremental gradient method with a constant step size, SIAM J. Optim. 18 (2007), pp. 29–51]. We believe that our techniques will find further applications in the non-asymptotic convergence analysis of other first-order methods.

  • Research Article
  • Cite Count Icon 124
  • 10.1007/s10589-007-9145-6
Optimization reformulations of the generalized Nash equilibrium problem using Nikaido-Isoda-type functions
  • Nov 15, 2007
  • Computational Optimization and Applications
  • Anna Von Heusinger + 1 more

We consider the generalized Nash equilibrium problem which, in contrast to the standard Nash equilibrium problem, allows joint constraints of all players involved in the game. Using a regularized Nikaido-Isoda-function, we then present three optimization problems related to the generalized Nash equilibrium problem. The first optimization problem is a complete reformulation of the generalized Nash game in the sense that the global minima are precisely the solutions of the game. However, this reformulation is nonsmooth. We then modify this approach and obtain a smooth constrained optimization problem whose global minima correspond to so-called normalized Nash equilibria. The third approach uses the difference of two regularized Nikaido-Isoda-functions in order to get a smooth unconstrained optimization problem whose global minima are, once again, precisely the normalized Nash equilibria. Conditions for stationary points to be global minima of the two smooth optimization problems are also given. Some numerical results illustrate the behaviour of our approaches.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.amc.2015.07.081
A superlinearly convergent QP-free algorithm for mathematical programs with equilibrium constraints
  • Aug 24, 2015
  • Applied Mathematics and Computation
  • Jianling Li + 2 more

A superlinearly convergent QP-free algorithm for mathematical programs with equilibrium constraints

  • Research Article
  • Cite Count Icon 12
  • 10.1007/s10957-020-01636-7
A Modified Nonlinear Conjugate Gradient Algorithm for Large-Scale Nonsmooth Convex Optimization
  • Feb 12, 2020
  • Journal of Optimization Theory and Applications
  • Tsegay Giday Woldu + 3 more

Nonlinear conjugate gradient methods are among the most preferable and effortless methods to solve smooth optimization problems. Due to their clarity and low memory requirements, they are more desirable for solving large-scale smooth problems. Conjugate gradient methods make use of gradient and the previous direction information to determine the next search direction, and they require no numerical linear algebra. However, the utility of nonlinear conjugate gradient methods has not been widely employed in solving nonsmooth optimization problems. In this paper, a modified nonlinear conjugate gradient method, which achieves the global convergence property and numerical efficiency, is proposed to solve large-scale nonsmooth convex problems. The new method owns the search direction, which generates sufficient descent property and belongs to a trust region. Under some suitable conditions, the global convergence of the proposed algorithm is analyzed for nonsmooth convex problems. The numerical efficiency of the proposed algorithm is tested and compared with some existing methods on some large-scale nonsmooth academic test problems. The numerical results show that the new algorithm has a very good performance in solving large-scale nonsmooth problems.

  • Research Article
  • 10.25972/opus-24906
Theoretical and numerical investigation of optimal control problems governed by kinetic models
  • Jan 1, 2021
  • Jan Bartsch

Theoretical and numerical investigation of optimal control problems governed by kinetic models

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.1007/s11075-022-01495-5
A class of spectral conjugate gradient methods for Riemannian optimization
  • Jan 13, 2023
  • Numerical Algorithms
  • Chunming Tang + 3 more

Spectral conjugate gradient (SCG) methods are combinations of spectral gradient method and conjugate gradient (CG) methods, which have been well studied in Euclidean space. In this paper, we aim to extend this class of methods to solve optimization problems on Riemannian manifolds. Firstly, we present a Riemannian version of the spectral parameter, which guarantees that the search direction always satisfies the sufficient descent property without the help of any line search strategy. Secondly, we introduce a generic algorithmic framework for the Riemannian SCG methods, in which the selection of the CG parameter is very flexible. Under the Riemannian Wolfe conditions, the global convergence of the proposed algorithmic framework is established whenever the absolute value of the CG parameter is no more than the Riemannian Fletcher–Reeves CG parameter. Finally, some preliminary numerical results are reported and compared with several classical Riemannian CG methods, which show that our new methods are efficient.

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.amc.2013.05.011
A dwindling filter inexact projected Hessian algorithm for large scale nonlinear constrained optimization
  • Jun 11, 2013
  • Applied Mathematics and Computation
  • Chao Gu

A dwindling filter inexact projected Hessian algorithm for large scale nonlinear constrained optimization

  • Research Article
  • Cite Count Icon 31
  • 10.1007/s10107-018-1282-4
Generalized self-concordant functions: a recipe for Newton-type methods
  • May 8, 2018
  • Mathematical Programming
  • Tianxiao Sun + 1 more

We study the smooth structure of convex functions by generalizing a powerful concept so-called self-concordance introduced by Nesterov and Nemirovskii in the early 1990s to a broader class of convex functions which we call generalized self-concordant functions. This notion allows us to develop a unified framework for designing Newton-type methods to solve convex optimization problems. The proposed theory provides a mathematical tool to analyze both local and global convergence of Newton-type methods without imposing unverifiable assumptions as long as the underlying functionals fall into our class of generalized self-concordant functions. First, we introduce the class of generalized self-concordant functions which covers the class of standard self-concordant functions as a special case. Next, we establish several properties and key estimates of this function class which can be used to design numerical methods. Then, we apply this theory to develop several Newton-type methods for solving a class of smooth convex optimization problems involving generalized self-concordant functions. We provide an explicit step-size for a damped-step Newton-type scheme which can guarantee a global convergence without performing any globalization strategy. We also prove a local quadratic convergence of this method and its full-step variant without requiring the Lipschitz continuity of the objective Hessian mapping. Then, we extend our result to develop proximal Newton-type methods for a class of composite convex minimization problems involving generalized self-concordant functions. We also achieve both global and local convergence without additional assumptions. Finally, we verify our theoretical results via several numerical examples, and compare them with existing methods.

  • Research Article
  • 10.1016/j.cam.2022.114350
An extended projected residual algorithm for solving smooth convex optimization problems
  • Apr 7, 2022
  • Journal of Computational and Applied Mathematics
  • William La Cruz

An extended projected residual algorithm for solving smooth convex optimization problems

  • Research Article
  • Cite Count Icon 11
  • 10.1080/00207160.2018.1494825
The Hager–Zhang conjugate gradient algorithm for large-scale nonlinear equations
  • Jul 10, 2018
  • International Journal of Computer Mathematics
  • Gonglin Yuan + 2 more

ABSTRACTIn this paper, the Hager–Zhang (HZ) conjugate gradient (CG) algorithm is studied for large-scale smooth optimization problems. (i) Some results of the HZ CG method for smooth unconstrained optimization problems are given, and a modified HZ (MHZ) CG method is proposed; (ii) the HZ and MHZ CG methods for nonlinear equations are analysed, the global convergence is established and numerical results for large-scale nonlinear equation problems are reported (1,00,000 variables).

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/cdc.1986.267602
Optimal routing in circuit-switched communication networks
  • Dec 1, 1986
  • Alexander Gersht + 1 more

We consider the optimal circuit routing problem. The problem consists of accommodating a given circuit demand in an existing circuit-switched network. The objective is to find a circuit accommodation providing the maximum residual capacity over the network under the total circuit cost constraints. Practical considerations require a solution which is robust to the variations in circuit demand and cost. The objective function for the circuit routing problem is not a smooth one. In order to overcome the difficulties of nonsmooth optimization, a sequence of smooth convex optimization problems is considered. The optimal algorithm for the circuit routing problem is obtained as a limiting case of the sequence of the optimal routing strategies for the corresponding smooth optimization problems. The proof of its convergence to the optimal solution is given. This optimization algorithm is capable of efficiently handling networks with a large number of commodities. It also satisfies the above-mentioned robustness requirements. Numerical results are discussed.

  • Book Chapter
  • 10.1007/11759966_138
A Smoothing Multiple Support Vector Machine Model
  • Jan 1, 2006
  • Huihong Jin + 2 more

In this paper, we study a smoothing multiple support vector machine (SVM) by using exact penalty function. First, we formulate the optimization problem of multiple SVM as an unconstrained and nonsmooth optimization problem via exact penalty function. Then, we propose a two-order differentiable function to approximately smooth the exact penalty function, and get an unconstrained and smooth optimization problem. By error analysis, we can get approximate solution of multiple SVM by solving its approximately smooth penalty optimization problem without constraint. Finally, we give a corporate culture model by using multiple SVM as a factual example. Compared with artificial neural network, the precision of our smoothing multiple SVM which is illustrated with the numerical experiment is better.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/11596448_83
A Smoothing Support Vector Machine Based on Exact Penalty Function
  • Jan 1, 2005
  • Zhiqing Meng + 3 more

In this paper, we study a smoothing support vector machine (SVM) by using exact penalty function. First, we formulate the optimization problem of SVM as an unconstrained and nonsmooth optimization problem via exact penalty function. Second, we propose a two-order differentiable function to approximately smooth the exact penalty function, and get an unconstrained and smooth optimization problem. Third, by error analysis, we can get approximate solution of SVM by solving its approximately smooth penalty optimization problem without constraint. Compared with artificial neural network and time sequence, the precision of prediction of our smoothing SVM which is illustrated with the numerical experiment is better.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.