A globally convergent gradient method with momentum
Abstract In this work, we consider smooth unconstrained optimization problems and we deal with the class of gradient methods with momentum, i.e., descent algorithms where the search direction is defined as a linear combination of the current gradient and the preceding search direction. This family of algorithms includes nonlinear conjugate gradient methods and Polyak’s heavy-ball approach, and is thus of high practical and theoretical interest in large-scale nonlinear optimization. We propose a general framework where the scalars of the linear combination defining the search direction are computed simultaneously by minimizing the approximate quadratic model in the 2 dimensional subspace. This strategy allows us to define a class of gradient methods with momentum enjoying global convergence guarantees and an optimal worst-case complexity bound in the nonconvex setting. Differently than all related works in the literature, the convergence conditions are stated in terms of the Hessian matrix of the bi-dimensional quadratic model. To the best of our knowledge, these results are novel to the literature. Moreover, extensive computational experiments show that the gradient method with momentum here presented is competitive with respect to other popular solvers for nonconvex unconstrained problems.
- Research Article
940
- 10.1137/1011036
- Apr 1, 1969
- SIAM Review
Convergence Conditions for Ascent Methods
- Research Article
84
- 10.1016/j.cam.2013.04.032
- Apr 26, 2013
- Journal of Computational and Applied Mathematics
A modified Polak–Ribière–Polyak conjugate gradient algorithm for nonsmooth convex programs
- Research Article
38
- 10.1080/10556788.2017.1296439
- May 31, 2017
- Optimization Methods and Software
Many recent applications in machine learning and data fitting call for the algorithmic solution of structured smooth convex optimization problems. Although the gradient descent method is a natural choice for this task, it requires exact gradient computations and hence can be inefficient when the problem size is large or the gradient is difficult to evaluate. Therefore, there has been much interest in inexact gradient methods (IGMs), in which an efficiently computable approximate gradient is used to perform the update in each iteration. Currently, non-asymptotic linear convergence results for IGMs are typically established under the assumption that the objective function is strongly convex, which is not satisfied in many applications of interest; while linear convergence results that do not require the strong convexity assumption are usually asymptotic in nature. In this paper, we combine the best of these two types of results by developing a framework for analysing the non-asymptotic convergence rates of IGMs when they are applied to a class of structured convex optimization problems that includes least squares regression and logistic regression. We then demonstrate the power of our framework by proving, in a unified manner, new linear convergence results for three recently proposed algorithms—the incremental gradient method with increasing sample size [R.H. Byrd, G.M. Chin, J. Nocedal, and Y. Wu, Sample size selection in optimization methods for machine learning, Math. Program. Ser. B 134 (2012), pp. 127–155; M.P. Friedlander and M. Schmidt, Hybrid deterministic–stochastic methods for data fitting, SIAM J. Sci. Comput. 34 (2012), pp. A1380–A1405], the stochastic variance-reduced gradient (SVRG) method [R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems 26: Proceedings of the 2013 Conference, 2013, pp. 315–323], and the incremental aggregated gradient (IAG) method [D. Blatt, A.O. Hero, and H. Gauchman, A convergent incremental gradient method with a constant step size, SIAM J. Optim. 18 (2007), pp. 29–51]. We believe that our techniques will find further applications in the non-asymptotic convergence analysis of other first-order methods.
- Research Article
124
- 10.1007/s10589-007-9145-6
- Nov 15, 2007
- Computational Optimization and Applications
We consider the generalized Nash equilibrium problem which, in contrast to the standard Nash equilibrium problem, allows joint constraints of all players involved in the game. Using a regularized Nikaido-Isoda-function, we then present three optimization problems related to the generalized Nash equilibrium problem. The first optimization problem is a complete reformulation of the generalized Nash game in the sense that the global minima are precisely the solutions of the game. However, this reformulation is nonsmooth. We then modify this approach and obtain a smooth constrained optimization problem whose global minima correspond to so-called normalized Nash equilibria. The third approach uses the difference of two regularized Nikaido-Isoda-functions in order to get a smooth unconstrained optimization problem whose global minima are, once again, precisely the normalized Nash equilibria. Conditions for stationary points to be global minima of the two smooth optimization problems are also given. Some numerical results illustrate the behaviour of our approaches.
- Research Article
3
- 10.1016/j.amc.2015.07.081
- Aug 24, 2015
- Applied Mathematics and Computation
A superlinearly convergent QP-free algorithm for mathematical programs with equilibrium constraints
- Research Article
12
- 10.1007/s10957-020-01636-7
- Feb 12, 2020
- Journal of Optimization Theory and Applications
Nonlinear conjugate gradient methods are among the most preferable and effortless methods to solve smooth optimization problems. Due to their clarity and low memory requirements, they are more desirable for solving large-scale smooth problems. Conjugate gradient methods make use of gradient and the previous direction information to determine the next search direction, and they require no numerical linear algebra. However, the utility of nonlinear conjugate gradient methods has not been widely employed in solving nonsmooth optimization problems. In this paper, a modified nonlinear conjugate gradient method, which achieves the global convergence property and numerical efficiency, is proposed to solve large-scale nonsmooth convex problems. The new method owns the search direction, which generates sufficient descent property and belongs to a trust region. Under some suitable conditions, the global convergence of the proposed algorithm is analyzed for nonsmooth convex problems. The numerical efficiency of the proposed algorithm is tested and compared with some existing methods on some large-scale nonsmooth academic test problems. The numerical results show that the new algorithm has a very good performance in solving large-scale nonsmooth problems.
- Research Article
- 10.25972/opus-24906
- Jan 1, 2021
Theoretical and numerical investigation of optimal control problems governed by kinetic models
- Research Article
7
- 10.1007/s11075-022-01495-5
- Jan 13, 2023
- Numerical Algorithms
Spectral conjugate gradient (SCG) methods are combinations of spectral gradient method and conjugate gradient (CG) methods, which have been well studied in Euclidean space. In this paper, we aim to extend this class of methods to solve optimization problems on Riemannian manifolds. Firstly, we present a Riemannian version of the spectral parameter, which guarantees that the search direction always satisfies the sufficient descent property without the help of any line search strategy. Secondly, we introduce a generic algorithmic framework for the Riemannian SCG methods, in which the selection of the CG parameter is very flexible. Under the Riemannian Wolfe conditions, the global convergence of the proposed algorithmic framework is established whenever the absolute value of the CG parameter is no more than the Riemannian Fletcher–Reeves CG parameter. Finally, some preliminary numerical results are reported and compared with several classical Riemannian CG methods, which show that our new methods are efficient.
- Research Article
4
- 10.1016/j.amc.2013.05.011
- Jun 11, 2013
- Applied Mathematics and Computation
A dwindling filter inexact projected Hessian algorithm for large scale nonlinear constrained optimization
- Research Article
31
- 10.1007/s10107-018-1282-4
- May 8, 2018
- Mathematical Programming
We study the smooth structure of convex functions by generalizing a powerful concept so-called self-concordance introduced by Nesterov and Nemirovskii in the early 1990s to a broader class of convex functions which we call generalized self-concordant functions. This notion allows us to develop a unified framework for designing Newton-type methods to solve convex optimization problems. The proposed theory provides a mathematical tool to analyze both local and global convergence of Newton-type methods without imposing unverifiable assumptions as long as the underlying functionals fall into our class of generalized self-concordant functions. First, we introduce the class of generalized self-concordant functions which covers the class of standard self-concordant functions as a special case. Next, we establish several properties and key estimates of this function class which can be used to design numerical methods. Then, we apply this theory to develop several Newton-type methods for solving a class of smooth convex optimization problems involving generalized self-concordant functions. We provide an explicit step-size for a damped-step Newton-type scheme which can guarantee a global convergence without performing any globalization strategy. We also prove a local quadratic convergence of this method and its full-step variant without requiring the Lipschitz continuity of the objective Hessian mapping. Then, we extend our result to develop proximal Newton-type methods for a class of composite convex minimization problems involving generalized self-concordant functions. We also achieve both global and local convergence without additional assumptions. Finally, we verify our theoretical results via several numerical examples, and compare them with existing methods.
- Research Article
- 10.1016/j.cam.2022.114350
- Apr 7, 2022
- Journal of Computational and Applied Mathematics
An extended projected residual algorithm for solving smooth convex optimization problems
- Research Article
11
- 10.1080/00207160.2018.1494825
- Jul 10, 2018
- International Journal of Computer Mathematics
ABSTRACTIn this paper, the Hager–Zhang (HZ) conjugate gradient (CG) algorithm is studied for large-scale smooth optimization problems. (i) Some results of the HZ CG method for smooth unconstrained optimization problems are given, and a modified HZ (MHZ) CG method is proposed; (ii) the HZ and MHZ CG methods for nonlinear equations are analysed, the global convergence is established and numerical results for large-scale nonlinear equation problems are reported (1,00,000 variables).
- Conference Article
2
- 10.1109/cdc.1986.267602
- Dec 1, 1986
We consider the optimal circuit routing problem. The problem consists of accommodating a given circuit demand in an existing circuit-switched network. The objective is to find a circuit accommodation providing the maximum residual capacity over the network under the total circuit cost constraints. Practical considerations require a solution which is robust to the variations in circuit demand and cost. The objective function for the circuit routing problem is not a smooth one. In order to overcome the difficulties of nonsmooth optimization, a sequence of smooth convex optimization problems is considered. The optimal algorithm for the circuit routing problem is obtained as a limiting case of the sequence of the optimal routing strategies for the corresponding smooth optimization problems. The proof of its convergence to the optimal solution is given. This optimization algorithm is capable of efficiently handling networks with a large number of commodities. It also satisfies the above-mentioned robustness requirements. Numerical results are discussed.
- Book Chapter
- 10.1007/11759966_138
- Jan 1, 2006
In this paper, we study a smoothing multiple support vector machine (SVM) by using exact penalty function. First, we formulate the optimization problem of multiple SVM as an unconstrained and nonsmooth optimization problem via exact penalty function. Then, we propose a two-order differentiable function to approximately smooth the exact penalty function, and get an unconstrained and smooth optimization problem. By error analysis, we can get approximate solution of multiple SVM by solving its approximately smooth penalty optimization problem without constraint. Finally, we give a corporate culture model by using multiple SVM as a factual example. Compared with artificial neural network, the precision of our smoothing multiple SVM which is illustrated with the numerical experiment is better.
- Book Chapter
1
- 10.1007/11596448_83
- Jan 1, 2005
In this paper, we study a smoothing support vector machine (SVM) by using exact penalty function. First, we formulate the optimization problem of SVM as an unconstrained and nonsmooth optimization problem via exact penalty function. Second, we propose a two-order differentiable function to approximately smooth the exact penalty function, and get an unconstrained and smooth optimization problem. Third, by error analysis, we can get approximate solution of SVM by solving its approximately smooth penalty optimization problem without constraint. Compared with artificial neural network and time sequence, the precision of prediction of our smoothing SVM which is illustrated with the numerical experiment is better.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.