Abstract

It is a need of time to build an automatic speech recognition (ASR) system for low-resource languages. India is a multilingual country with more than 5000 languages, and out of which 22 languages are official. The ASR systems for Indian languages are still in the infancy stage, mainly due to resource deficiency (e.g., lack of transcribed speech data, pronunciation lexicon, and text data). Deep neural networks (DNNs) have significantly improved the performance of the ASR system of low-resource languages. The merging data of various languages is a commonly used trend to train the multilingual DNN acoustic model. In multilingual training, hidden layers act as a global feature extractor. The multilingual ASR systems work better for similar kinds of sources and target languages. In this work, we use two low-resource Indian languages, namely Hindi and Marathi. Both languages are closely related and belong to the same Indo-Aryan family. Both languages include various common phonemes and words. Various state-of-the-art experiments were performed using different acoustic modeling and language modeling techniques. Experiments demonstrated that multilingual ASR systems consistently outperform monolingual ASR systems for both Hindi and Marathi languages.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call