Abstract

Abstract We consider several line-search based gradient methods for stochastic optimization: a gradient and accelerated gradient methods for convex optimization and gradient method for non-convex optimization. The methods simultaneously adapt to the unknown Lipschitz constant of the gradient and variance of the stochastic approximation for the gradient. The focus of this paper is to numerically compare such methods with state-of-the-art adaptive methods which are based on a different idea of taking norm of the stochastic gradient to define the stepsize, e.g., AdaGrad and Adam.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call