Bi-Lingual TDNN-LSTM Acoustic Modeling for Limited Resource Hindi and Marathi Language ASR

Ankit Kumar,Rajesh Kumar Aggarwal

doi:10.1007/978-981-33-6881-1_33

Ankit Kumar, Rajesh Kumar Aggarwal

https://doi.org/10.1007/978-981-33-6881-1_33

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

It is a need of time to build an automatic speech recognition (ASR) system for low-resource languages. India is a multilingual country with more than 5000 languages, and out of which 22 languages are official. The ASR systems for Indian languages are still in the infancy stage, mainly due to resource deficiency (e.g., lack of transcribed speech data, pronunciation lexicon, and text data). Deep neural networks (DNNs) have significantly improved the performance of the ASR system of low-resource languages. The merging data of various languages is a commonly used trend to train the multilingual DNN acoustic model. In multilingual training, hidden layers act as a global feature extractor. The multilingual ASR systems work better for similar kinds of sources and target languages. In this work, we use two low-resource Indian languages, namely Hindi and Marathi. Both languages are closely related and belong to the same Indo-Aryan family. Both languages include various common phonemes and words. Various state-of-the-art experiments were performed using different acoustic modeling and language modeling techniques. Experiments demonstrated that multilingual ASR systems consistently outperform monolingual ASR systems for both Hindi and Marathi languages.

Full Text