Gradient amplification: An efficient way to train deep neural networks

Sunitha Basodi,Yi Pan,Haiping Zhang,Chunyan Ji

doi:10.26599/bdma.2020.9020004

Abstract

Improving performance of deep learning models and reducing their training times are ongoing challenges in deep neural networks. There are several approaches proposed to address these challenges one of which is to increase the depth of the neural networks. Such deeper networks not only increase training times, but also suffer from vanishing gradients problem while training. In this work, we propose gradient amplification approach for training deep learning models to prevent vanishing gradients and also develop a training strategy to enable or disable gradient amplification method across several epochs with different learning rates. We perform experiments on VGG-19 and resnet (Resnet-18 and Resnet-34) models, and study the impact of amplification parameters on these models in detail. Our proposed approach improves performance of these deep learning models even at higher learning rates, thereby allowing these models to achieve higher performance with reduced training time.

Highlights

Deep learning models have achieved state-of-the-art performances in several areas including computer vision[1], automatic speech recognition[2], natural language processing[3], and beyond[4–8]
Our experiments show that for VGG-19 models, selecting Rectified Linear Unit (ReLU) improves the performance of the models, but achieves best performance when Batch Normalization (BN) layers are chosen for amplification
We propose a novel gradient amplification method to dynamically increase gradients during backpropagation

Summary

Introduction

Deep learning models have achieved state-of-the-art performances in several areas including computer vision[1], automatic speech recognition[2], natural language processing[3], and beyond[4–8]. These models are designed, trained, and tuned to achieve better performance for a given dataset. Depending on the type of the activation functions and network architectures, sometimes the gradient value is too small and gets gradually diminished during backpropagation to the initial layers. This prevents the network from updating its weights and sometimes when the value is too small, the network may be completely stopped from training (updating weights).

Objectives

Methods

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Big Data Mining and Analytics	Publication Date: Jul 2, 2020
Citations: 74	License type: cc-by

R Discovery Prime

R Discovery Prime

Gradient amplification: An efficient way to train deep neural networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Big Data Mining and Analytics

Lead the way for us

Similar Papers

Weed recognition using deep learning techniques on class-imbalanced imagery
A S M Mahmudul Hasan ... Davide Cammarano
Australian Journal of Agricultural Research | VOL. 74
A S M Mahmudul Hasan, et. al.A S M Mahmudul Hasan ... Davide Cammarano
01 Jan 2021
Australian Journal of Agricultural Research | VOL. 74

Deep chemometrics: Validation and transfer of a global deep near‐infrared fruit model to use it on a new portable instrument
Puneet Mishra ... Dário Passos
Journal of chemometrics | VOL. 35
Puneet Mishra, et. al.Puneet Mishra ... Dário Passos
21 Jul 2021
Journal of chemometrics | VOL. 35

Abstract 11190: Subgroup Comparison of Electrocardiogram Deep-Learning Model Performance for Estimating Coronary Artery Calcium Score
Hakje Yoo ... Sung Ho Hwang
Circulation | VOL. 146
Hakje Yoo, et. al.Hakje Yoo ... Sung Ho Hwang
08 Nov 2022
Circulation | VOL. 146

Prediction of axillary lymph node metastasis in early breast cancer patients with ultrasonic videos based deep learning.
Wei-Bin Li ... Yue-Jie Liu
Frontiers in Oncology | VOL. 13
Wei-Bin Li, et. al.Wei-Bin Li ... Yue-Jie Liu
01 Sep 2023
Frontiers in Oncology | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gradient amplification: An efficient way to train deep neural networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Big Data Mining and Analytics