Biased stochastic conjugate gradient algorithm with adaptive step size for nonconvex problems

Ruping Huang,Yan Qin,Kejun Liu,Gonglin Yuan

doi:10.1016/j.eswa.2023.121556

Abstract

Conjugate gradient (CG) algorithms are widely applied to machine learning problems owing to their low calculation cost compared with second-order methods and better convergence properties relative to gradient descent methods. This study proposes a biased stochastic conjugate gradient (BSCG) algorithm with an adaptive step size. Specifically, the BSCG algorithm integrates both the stochastic recursive gradient method (SARAH) and the modified Barzilai–Borwein (BB) technique into the typical stochastic gradient algorithm. Compared with most of the available stochastic gradient methods, which obtain step size by selecting a fixed size elaborately and may lead to unfeasibility in finding an optimal solution, the proposed algorithm uses second-order information to gain an appropriate step size without increasing the computational cost. We not only prove that the proposed algorithm converges to global optimum, but also builds up its linear convergence rate for nonconvex objective functions. The numerical results of two machine learning models demonstrate show the superiority of the BSCG algorithm.

Full Text