Exploring a Unified ASR for Multiple South Indian Languages Leveraging Multilingual Acoustic and Language Models

C S Anoop,A G Ramakrishnan

doi:10.1109/slt54892.2023.10022380

Exploring a Unified ASR for Multiple South Indian Languages Leveraging Multilingual Acoustic and Language Models

C S Anoop, A G Ramakrishnan

https://doi.org/10.1109/slt54892.2023.10022380

Copy DOI

Publication Date: Jan 9, 2023

Affiliation: Indian Institute of Science Bangalore

#South Indian Languages #Indian Languages + Show 8 more

Abstract
Full-Text
Similar Papers

Abstract

We build a single automatic speech recognition (ASR) model for several south Indian languages using a common set of intermediary labels, which can be easily mapped to the desired native script through simple lookup tables and a few rules. We use Sanskrit Library Phonetic encoding as the labeling scheme, which exploits the similarity in pronunciation across character sets of multiple Indian languages. Unlike the general approaches, which leverage common label sets only for multilingual acoustic modeling, we also explore multilingual language modeling. Our unified model improves the ASR performance in languages with limited amounts of speech data and also in out-of-domain test conditions. Also, the model performs reasonably well in languages with good representation in the training data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.