Abstract

Improving performance of deep learning models and reducing their training times are ongoing challenges in deep neural networks. There are several approaches proposed to address these challenges one of which is to increase the depth of the neural networks. Such deeper networks not only increase training times, but also suffer from vanishing gradients problem while training. In this work, we propose gradient amplification approach for training deep learning models to prevent vanishing gradients and also develop a training strategy to enable or disable gradient amplification method across several epochs with different learning rates. We perform experiments on VGG-19 and resnet (Resnet-18 and Resnet-34) models, and study the impact of amplification parameters on these models in detail. Our proposed approach improves performance of these deep learning models even at higher learning rates, thereby allowing these models to achieve higher performance with reduced training time.

Highlights

  • Deep learning models have achieved state-of-the-art performances in several areas including computer vision[1], automatic speech recognition[2], natural language processing[3], and beyond[4–8]

  • Our experiments show that for VGG-19 models, selecting Rectified Linear Unit (ReLU) improves the performance of the models, but achieves best performance when Batch Normalization (BN) layers are chosen for amplification

  • We propose a novel gradient amplification method to dynamically increase gradients during backpropagation

Read more

Summary

Introduction

Deep learning models have achieved state-of-the-art performances in several areas including computer vision[1], automatic speech recognition[2], natural language processing[3], and beyond[4–8]. These models are designed, trained, and tuned to achieve better performance for a given dataset. Depending on the type of the activation functions and network architectures, sometimes the gradient value is too small and gets gradually diminished during backpropagation to the initial layers. This prevents the network from updating its weights and sometimes when the value is too small, the network may be completely stopped from training (updating weights).

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call