Abstract
Automatic speech recognition has made huge progress recently. However, the current modeling strategy still suffers a large performance degradation when facing the low-resource languages with limited training data. In this paper, we propose a series of methods to optimize the data usage for low-resource speech recognition. Multilingual speech recognition helps a lot in low-resource scenarios. The correlation and similarity between languages are further exploited for multilingual pretraining in our work. We utilize the posterior of the target language extracted from a language classifier to perform data weighing on training samples, which assists the model in being more biased towards the target language during pretraining. Furthermore, dynamic curriculum learning for data allocation and length perturbation for data augmentation are also designed. All these three methods form the new strategy on optimized data usage for low-resource languages. We evaluate the proposed method using rich resource languages for pretraining (PT) and finetuning (FT) the model on the target language with limited data. Experimental results show that the proposed data usage method obtains a 15 to 25% relative word error rate reduction for different target languages compared with the commonly adopted multilingual PT+FT method on CommonVoice dataset. The same improvement and conclusion are also observed on Babel dataset with conversational telephone speech, and <inline-formula><tex-math notation="LaTeX">$\sim$</tex-math></inline-formula>40% relative character error rate reduction can be obtained for the target low-resource language.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.