Abstract

Recent years have seen increased interest in performance guarantees of gradient descent algorithms for nonconvex optimization. A number of works have uncovered that gradient noise plays a critical role in the ability of gradient descent recursions to efficiently escape saddle-points and reach second-order stationary points. Most available works limit the gradient noise component to be bounded with probability one or sub-Gaussian and leverage concentration inequalities to arrive at high-probability results. We present an alternate approach, relying primarily on mean-square arguments and show that a more relaxed relative bound on the gradient noise variance is sufficient to ensure efficient escape from saddle points without the need to inject additional noise, employ alternating step sizes, or rely on a global dispersive noise assumption, as long as a gradient noise component is present in a descent direction for every saddle point.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call