Abstract
Recurrent neural networks (RNNs) have shown an ability to model temporal dependencies. However the problem of exploding or vanishing gradients has limited their application. In recent years, long short-term memory RNNs (LSTM RNNs) have been proposed to solve this problem, and have achieved excellent results. However, because of the large size of LSTM RNNs, they more easily suffer from overfitting, especially for low resource tasks. In addition, because the output of LSTM units is bounded, there is often still a vanishing gradient issue over multiple layers. In this work, we evaluate an architecture called gated recurrent units (GRU) to solve these two problems. In comparison with LSTM RNN, the size of the GRU network is smaller, so this model can more easily avoid overfitting. Furthermore, the output of the GRU is not constrained to be bounded, which helps alleviate the negative impact of vanishing gradients across multiple layers. We propose using deep bidirectional GRUs as hybrid acoustic models and compare the similarities and differences between LSTM and GRU. We evaluate this architecture on the CHIME2 dataset, a robust low resource speech recognition task. Results demonstrate that our architecture outperforms LSTM, relatively decreasing the WER about 6% in the bidirectional case and, when combined with a baseline GMM system, achieves 1.1% absolute WER reduction comparing to a strong baseline mixing system.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.