Abstract

Recurrent neural networks (RNN) are efficient in modeling sequences for generation and classification, but their training is obstructed by the vanishing and exploding gradient issues. In this paper, we reformulate the RNN unit to learn the residual functions with reference to the hidden state instead of conventional gated mechanisms such as long short-term memory (LSTM) and the gated recurrent unit (GRU). The residual structure has two main highlights: firstly, it solves the gradient vanishing and exploding issues for large time-distributed scales; secondly, the residual structure promotes the optimizations for backward updates. In the experiments, we apply language modeling, emotion classification and polyphonic modeling to evaluate our layer compared with LSTM and GRU layers. The results show that our layer gives state-of-the-art performance, outperforms LSTM and GRU layers in terms of speed, and supports an accuracy competitive with that of the other methods.

Highlights

  • Recurrent neural networks (RNNs) have proved to be efficient to learn sequential data, such as in acoustic modeling [1,2], natural language process [3,4], machine translation [5,6], and sentiment analysis [7,8]

  • The fourth section demonstrates the results of our network and compares the results with simple RNN, long short-term memory (LSTM) and gated recurrent unit (GRU) in various fields: airline travel information system (ATIS) database [22], Internet movie database (IMDB) [23] and polyphonic database [24]

  • Our results provide a competitive accuracy with LSTM in this task of spoken language understanding, but better than the RNN and GRU

Read more

Summary

Introduction

Recurrent neural networks (RNNs) have proved to be efficient to learn sequential data, such as in acoustic modeling [1,2], natural language process [3,4], machine translation [5,6], and sentiment analysis [7,8]. The output layer determines the degree of the memory exposure Another gated RNN unit, gated recurrent unit (GRU) [17], has been introduced by Cho et al in the context of machine translation. In the proposed residual recurrent networks (Res-RNN), we use residual learning to solve the gradient issues in the process of horizontal propagation in training. We propose our Res-RNN unit and analyze how residual learning helps to train the RNNs. The fourth section demonstrates the results of our network and compares the results with simple RNN, LSTM and GRU in various fields: airline travel information system (ATIS) database [22], Internet movie database (IMDB) [23] and polyphonic database [24]. The experiments show that our novel recurrent unit can provide state-of-the-art performance

Gradient Issues
Residual-Shortcut Structure
Analysis of Res-RNN
Experiments and Discussion
ATIS Database
Findings
Polyphonic
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.