Abstract

A recurrent neural network (RNN) is a class of neural network models in which connections between its neurons form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. In this chapter, we describe several advanced RNN models for distant speech recognition (DSR). The first set of models are extensions of the prediction-adaptation-correction RNNs (PAC-RNNs). These models were inspired by the widely observed behavior of prediction, adaptation, and correction in human speech recognition. The second set of models, include highway long short-term memory (LSTM) RNNs, latency-controlled bidirectional LSTM RNNs, Grid LSTM RNNs, and Residual LSTM RNNs, are all extensions of deep LSTM RNNs. These models are so built that their optimization can be more effective than the basic deep LSTM RNNs. We evaluate and compare these advanced RNN models on DSR tasks using the AMI corpus.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call