Abstract

We focused on a study of comprehensive approaches to an improved code-switching speech recognition, using data augmentation and system combination methods. For data augmentation, we not only use speech speed perturbation based method, but we also attempt to add diversified room impulse response based reverberate noise, as well as music, babble, and white noise based additive noise. It is found we still can achieve significant performance improvement with such noise-corrupted data augmentation methods, though our SEAME code-switching data belongs to a clean corpus. In addition to data augmentation methods, we also adopt Minimum Bayesian risk-based lattice combination method to further improve our recognition results. We achieve significant word error rate (WER) reduction on lattice combination with/without recurrent neural network language model based lattice rescoring. Compared with our previous efforts [6], we achieve up to 2.29% and 5.61% absolute WER reduction on the two dev sets respectively, while 4.83% and 8.04% absolute WER reduction after system combination.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call