Abstract

Accelerating the basic gradient iteration is a critical issue in numerical optimization and deep-learning. Recently, William Kahan [Kahan, Automatic Step-Size Control for Minimization Iterations, Technical report, University of California, Berkeley CA, USA, 2019] proposed automatic step-size strategies for the gradient descent iteration without explicitly using any prior information of the Hessian; moreover, a new momentum-based gradient acceleration method, namely, an Anadromic Gradient Descent (AGD) iteration, was proposed. Besides the capability of accelerating the basic gradient descent iteration, AGD enables the iteration to return to the past iterates by just reversing steps in sign and order in the same updating formulation. Numerical performances of the automatic step-size strategies and AGD are claimed to be favourable. In this paper, for the quadratic model, through a revisit of some classical momentum-based gradient methods, we perform new analysis on the convergence and their optimal hyper-parameters. We also investigate the convergence behaviour of AGD with the optimal hyper-parameters and connect one Kahan's automatic step-size scheme with the long Barzilai–Borwein step-size. Numerical results are presented to reflect the theoretical convergence behaviours and demonstrate the practical performances of various momentum-based gradient methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call