Abstract

Gradient descent is the core and foundation of neural networks, and gradient descent optimization heuristics have greatly accelerated progress in deep learning. Although these methods are simple and effective, how they work remains unknown. Gradient descent optimization in deep learning has become a hot research topic. Some research efforts have tried to combine multiple methods to assist network training, but these methods seem to be more empirical, without theoretical guides. In this paper, a framework is proposed to illustrate the principle of combining different gradient descent optimization methods by analyzing several adaptive methods and other learning rate methods. Furthermore, inspired by the principle of warmup, CLR, and SGDR, the concept of multistage is introduced into the field of gradient descent optimization, and a gradient descent optimization strategy in deep learning model training based on multistage and method combination strategy is presented. The effectiveness of the proposed strategy is verified on the massive deep learning network training experiments.

Highlights

  • Today, thanks to the contribution of deep learning and deep neural networks, artificial intelligence (AI) is a thriving field with many practical applications and active research topics

  • Learning rate decay methods like cosine decay and adaptive learning rate methods like RMSprop [4] and Adam [5] are famous in the practical neural network training process. e methods based on gradient estimation, like Momentum [6] and Nesterov Accelerated Gradient (NAG) [7], are able to facilitate the neural network model training

  • We choose the most intuitive way to demonstrate it, taking different methods to compare the performance of different adjustment strategies and executing 10 epochs of training for each method. e performances of these are shown in Tables 6 and 7

Read more

Summary

Introduction

Thanks to the contribution of deep learning and deep neural networks, artificial intelligence (AI) is a thriving field with many practical applications and active research topics. Gradient descent is the core and foundation of a neural network. Just like the engine of a car, a deep neural network (DNN) is composed of many parts and the core is gradient descent optimization. In the fields of gradient descent optimization, quite a few methods have been proposed to improve the training performance of neural networks. E methods based on gradient estimation, like Momentum [6] and Nesterov Accelerated Gradient (NAG) [7], are able to facilitate the neural network model training. These methods work well, usually they are used alone for neural network model training. Loshchilov and Hutter [8] found that using a learning rate multiplier method can substantially improve Adam performance, and they advocate not to overlook the combining use of learning rate methods for Adam. ese methods can achieve certain improvement effects

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call