Abstract

The stochastic gradient descent (SGD) optimization algorithm is one of the central tools used to approximate solutions of stochastic optimization problems arising in machine learning and, in particular, deep learning applications. It is therefore important to analyze the convergence behavior of SGD. In this article we consider a simple quadratic stochastic optimization problem and establish for every γ,ν∈(0,∞) essentially matching lower and upper bounds for the mean square error of the associated SGD process with learning rates (γnν)n∈N. This allows us to precisely quantify the mean square convergence rate of the SGD method in dependence on the choice of the learning rates.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call