Acoustic data augmentation for Mandarin-English code-switching speech recognition

Yanhua Long,Yijie Li,Qiaozheng Zhang,Shuang Wei,Hong Ye,Jichen Yang

doi:10.1016/j.apacoust.2019.107175

Abstract

Code-switching (CS) is a multilingual phenomenon where a speaker uses different languages in an utterance or between alternating utterances. Developing large-scale datasets for training code-switching acoustic and language models is challenging and extremely expensive. In this paper, we focus on the acoustic data augmentation for the Mandarin-English CS speech recognition task. Effectiveness of conventional acoustic data augmentation approaches are examined. More importantly, we propose a CS acoustic event detection system based on the deep neural network to extract real code-switching speech segments automatically. Then, the semi-supervised and active learning techniques are investigated to generate transcriptions of these segments. Finally, code-switching speech synthesis system is introduced to further enhance the acoustic modeling. Experimental results on the OC16-CE80 data, a Mandarin-English mixlingual speech corpus, demonstrate the effectiveness of the proposed methods.

Full Text