Abstract

Stochastic gradient descent (SGD) is a commonly used technique in large-scale machine learning tasks, but its convergence is slow due to the inherent variance. In recent years, a popular method, Stochastic Variance Reduced Gradient (SVRG), addresses this shortcoming via computing the full gradient of the entire dataset in each epoch. However, conventional SVRG and its variants usually need to identify a hyperparameter — the epoch size, which is essential to the convergence performance. Few previous studies discuss how to systematically find a suitable value for that hyper-parameter, which makes it hard to gain a good convergence performance in practical machine learning applications. In this paper, we propose a new stochastic gradient descent named AESVRG, which introduces variance reduction and computes the full gradient adaptively. Its enhanced implementation, AESVRG+, has a convergence performance that can outplay existing SVRG with fine-tuned epoch sizes. An extensive evaluation illustrates the significant performance improvement of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call