Comparison of Data Augmentation and Adaptation Strategies for Code-switched Automatic Speech Recognition

Min Ma,Andrew Rosenberg,Bhuvana Ramabhadran,Jesse Emond,Fadi Biadsy

doi:10.1109/icassp.2019.8682824

Abstract

Code-switching occurs when the speaker alternates between two or more languages or dialects. It is a pervasive phenomenon in most Indic spoken languages. Code-switching poses a challenge in language modeling as it complicates the orthographic realization of text, and generally, there is a shortage of code-switched data. In this paper, we investigate data augmentation and adaptation strategies for language modeling. Using Bengali and English as an example, we study augmenting the code-switched transcripts with separate transliterated Bengali and English corpora. We present results on two speech recognition tasks, namely, voice search and dictation. We show improvements on both tasks with Maximum Entropy (MaxEnt) and Long Short-Term Memory (LSTM) language models (LMs). We also explore different adaptation strategies for MaxEnt LM and LSTM LM, demonstrating that the transliteration-based data-augmented LSTM LM matches the adapted MaxEnt LM which is trained on more Bengali-English data.

Full Text