Abstract

Recurrent neural networks (RNNs) have achieved remarkable improvements in acoustic modeling recently. However, the potential of RNNs have not been utilized for modeling Urdu acoustics. The connectionist temporal classification and attention based RNNs are suffered due to the unavailability of lexicon and computational cost of training, respectively. Therefore, we explored contemporary long short-term memory and gated recurrent neural networks Urdu acoustic modeling. The efficacies of plain, deep, bidirectional and deep-directional network architectures are evaluated empirically. Results indicate that deep-directional has an advantage over the other architectures. A word error rate of 20% was achieved on a hundred words dataset of twenty speakers. It shows 15% improvement over the baseline single-layer LSTMs. It has been observed that two-layer architectures can improve performance over single-layer, however the performance is degraded with further layers. LSTM architectures were compared with gated recurrent unit (GRU) based architectures and it was found that LSTM has an advantage over GRU.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.