Abstract

Successful applications of Deep Learning have brought about breakthroughs in natural language understanding, speech recognition, and computer vision. One of the major challenges of designing powerful Deep Learning solutions for tasks such as image classification and text parsing, however, is the difficulty of training Deep Neural Networks (DNNs) properly. Recent research has raised serious doubts about the use of adaptive gradient methods, which have been popularized for running faster and requiring less parameter tuning than nonadaptive gradient methods. A recent study shows that adaptive gradient methods are worse than nonadaptive gradient methods in terms of training loss and test error. In this paper, we aim to revisit this problem, evaluating several nonadaptive and adaptive gradient methods including a recently-proposed adaptive gradient algorithm, AMSGrad, which seeks to solve some of the problems present in previous adaptive gradient methods. We focus on the benchmark MNIST optical character recognition task, one of the most widely-used in machine learning research, to investigate the differences in using adaptive gradient methods and nonadaptive gradient methods to train DNNs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call