Analysis of Gradient Vanishing of RNNs and Performance Comparison

Seol-Hyun Noh

doi:10.3390/info12110442

Seol-Hyun Noh

Open Access

PDF Available

https://doi.org/10.3390/info12110442

Copy DOI

Export

Save

Cite

Journal: Information	Publication Date: Oct 25, 2021
Citations: 53	License type: CC BY 4.0

Affiliation: Anyang University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

A recurrent neural network (RNN) combines variable-length input data with a hidden state that depends on previous time steps to generate output data. RNNs have been widely used in time-series data analysis, and various RNN algorithms have been proposed, such as the standard RNN, long short-term memory (LSTM), and gated recurrent units (GRUs). In particular, it has been experimentally proven that LSTM and GRU have higher validation accuracy and prediction accuracy than the standard RNN. The learning ability is a measure of the effectiveness of gradient of error information that would be backpropagated. This study provided a theoretical and experimental basis for the result that LSTM and GRU have more efficient gradient descent than the standard RNN by analyzing and experimenting the gradient vanishing of the standard RNN, LSTM, and GRU. As a result, LSTM and GRU are robust to the degradation of gradient descent even when LSTM and GRU learn long-range input data, which means that the learning ability of LSTM and GRU is greater than standard RNN when learning long-range input data. Therefore, LSTM and GRU have higher validation accuracy and prediction accuracy than the standard RNN. In addition, it was verified whether the experimental results of river-level prediction models, solar power generation prediction models, and speech signal models using the standard RNN, LSTM, and GRUs are consistent with the analysis results of gradient vanishing.

Highlights

Moreno García and LuisA recurrent neural network (RNN) is a neural network model proposed in the1980s [1,2,3] for modeling time series
long short-term memory (LSTM) and gated recurrent units (GRUs) have higher validation accuracy and prediction accuracy than the standard RNN. It was verified whether the experimental results of river-level prediction models, solar power generation prediction models, and speech signal models using the standard RNN, LSTM, and GRUs are consistent with the analysis results of gradient vanishing
Experimental results have proven that LSTM and GRUs have higher validation accuracy and prediction accuracy than the standard RNN by effectively overcoming the gradient vanishing problem through effective learning, even for long-range dependent input and output data [6,7,9]

Summary

Introduction

Moreno García and LuisA recurrent neural network (RNN) is a neural network model proposed in the1980s [1,2,3] for modeling time series. An RNN is a neural network that combines variablelength input data with a hidden state that depends on previous time steps to produce output data. The main purpose of RNNs is to learn long-term dependencies, theoretical and empirical evidence shows that learning to store information for quite long is difficult. To address this problem, one solution is to augment a network with an explicit memory. The first proposal of this type is using long short-term memory (LSTM) networks with special hidden units, the natural behavior of which is to remember inputs for a long time. Similar to the LSTM unit, a GRU has gating units that modulate the flow of information inside the unit, without having separate memory cells [6]

Objectives

Results

Discussion

Conclusion