Adaptive stochastic conjugate gradient for machine learning

Zhuang Yang

doi:10.1016/j.eswa.2022.117719

Abstract

Due to their faster convergence rate than gradient descent algorithms and less computational cost than second order algorithms, conjugate gradient (CG) algorithms have been widely used in machine learning. This paper considers conjugate gradient in the mini-batch setting. Concretely, we propose a stable adaptive stochastic conjugate gradient (SCG) algorithm via incorporating both the stochastic recursive gradient algorithm (SARAH) and second order information into the CG-type algorithm. Unlike most of existing CG algorithms that spend a lot of time in determining the step size by using line search and may fail in stochastic optimization, the proposed algorithms use a local quadratic model to estimate the step size sequence, but do not require computing the Hessian information, which make the proposed algorithms attain a low computational cost as first-order algorithms. We establish the linear convergence rate of a class of SCG algorithms, when the loss function is the strongly convex. Moreover, we show that the complexity of the proposed algorithm matches modern stochastic optimization algorithms. As a by-product, we develop a practical variant of the proposed algorithm by setting a stopping criterion for the number of inner loop iterations. Various numerical experiments on machine learning problems demonstrate the efficiency of the proposed algorithms.

Full Text