Abstract

Recurrent neural networks (RNNs) can be used to operate over sequences of vectors and have been successfully applied to a variety of problems. However, it is hard to use RNNs to model the variable dwell time of the hidden state underlying an input sequence. In this article, we interpret the typical RNNs, including original RNN, standard long short-term memory (LSTM), peephole LSTM, projected LSTM, and gated recurrent unit (GRU), using a slightly extended hidden Markov model (HMM). Based on this interpretation, we are motivated to propose a novel RNN, called explicit duration recurrent network (EDRN), analog to a hidden semi-Markov model (HSMM). It has a better performance than conventional LSTMs and can explicitly model any duration distribution function of the hidden state. The model parameters become interpretable and can be used to infer many other quantities that the conventional RNNs cannot obtain. Therefore, EDRN is expected to extend and enrich the applications of RNNs. The interpretation also suggests that the conventional RNNs, including LSTM and GRU, can be made small modifications to improve their performance without increasing the parameters of the networks.

Highlights

  • T HE recurrent neural networks (RNNs) have been successfully applied to various sequence learning problems, such as speech recognition, language modeling, translation, image captioning, health detection, remote sensing, and intelligent transportation

  • We have shown that the SE-hidden Markov model (HMM) can interpret the typical RNNs, including original RNN, standard long short-term memory (LSTM), peephole LSTM, PLSTM, and gated recurrent unit (GRU)

  • Similar to the SE-HMM being extended to the hidden semi-Markov model (HSMM) that can construct any PDF of state duration, the explicit duration recurrent network architecture, EDRN, can capture the varying length of the hidden state that governs the sequences

Read more

Summary

INTRODUCTION

T HE recurrent neural networks (RNNs) have been successfully applied to various sequence learning problems, such as speech recognition, language modeling, translation, image captioning, health detection, remote sensing, and intelligent transportation. Sahin and Kozat [10] incorporate the time gap between consecutive samples as a nonlinear scaling factor on the conventional gates of the classical LSTM network and use this extended network to process nonuniformly sampled variable-length sequential data This methodology cannot be extended for modeling unknown varying time information underlying the input time series. We use this framework to interpret and unify the typical RNNs. 2) We further extend the SE-HMM to a new HSMM that can construct any probability density function (PDF) of state duration Based on this HSMM, we propose a novel explicit duration RNN, called EDRN, that can capture varying periods of the underlying state that governs the input sequences. Small modifications to the standard LSTM and GRU can improve their performance without increasing the complexity of the networks

SLIGHT EXTENSION TO STANDARD HMM
Definition of the Model
Forward Recursion Formulas
HMM VIEW ON TYPICAL RNNS
HMM View on the Original RNN
HMM View on the Standard LSTM
HMM View on Peephole LSTM
HMM View on GRU
HMM View on the Projected LSTM
EXPLICIT DURATION RECURRENT NETWORKS
Definition of the Explicit Duration Recurrent Network
Complexity of EDRN
Inference From the Parameters
Constructing Any Parametric State Duration Distribution
EVALUATION
Outperforming PLSTM and LSTM
Variable Duration and Meaningful State
Modified GRU
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.