Abstract

In Automatic Speech Recognition(ASR), Time Delay Neural Network (TDNN) has been proven to be an efficient network structure for its strong ability in context modeling. In addition, as a feed-forward neural architecture, it is faster to train TDNN, compared with recurrent neural networks such as Long Short-Term Memory (LSTM). However, different from recurrent neural networks, the context in TDNN is carefully designed and is limited. Although stacking Long Short-Term Memory (LSTM) together with TDNN in order to extend the context information have been proven to be useful, it is too complex and is hard to train. In this paper, we focus on directly extending the context modeling capability of TDNNs by adding recurrent connections. Several new network architectures were investigated. The results on the Switchboard show that the best model significantly outperforms the base line TDNN system and is comparable with TDNN-LSTM architecture. In addition, the training process is much simpler than that of TDNN-LSTM.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call