Stabilizing the training of deep neural networks using Adam optimization and gradient clipping

Rudra Tiwari

doi:10.55041/ijsrem17594

Abstract

The field of neural network training and optimization has seen significant advancements in recent years, with new techniques and algorithms being proposed to improve the efficiency and effectiveness of training. In this paper, we review several key optimization techniques and their impact on training neural networks, with a focus on long-term dependencies and the difficulties that can arise during training. We begin by discussing the challenges of learning long-term dependencies with gradient descent, as highlighted in the 1994 paper by Bengio et al. We then introduce Adam, a method for stochastic optimization proposed by Kingma and Ba in 2014. We also explore the difficulties of training recurrent neural networks, as discussed in the 2013 paper by Pascanu, Mikolov, and Bengio. We also review recent advances in optimization techniques such as Convergence of Adam and Beyond by Jianmin et al. (2017), Yogi by Dong et al. (2018), AdaBound by Zhang et al. (2018) and On the Variance of the Adaptive Learning Rate and Beyond by Liu et al. (2019). We will highlight the advantages and disadvantages of each technique and discuss their potential impact on the field. Overall, this paper provides a comprehensive overview of recent advancements in neural network optimization and their implications for training and performance. Keywords: Deep neural networks, optimization, Adam, gradient clipping, training, stabilization, overfitting, generalization, recurrent neural networks, non-convex loss landscapes.

Full Text