Mandarin-English bilingual phone modeling and combining MPE based Discriminative training for cross-language speech recognition

Yanmin Qian,Jia Liu

doi:10.1109/iscslp.2010.5684841

Abstract

Automatic multilingual speech recognition is always a difficult task. This paper presents recent work on the development of a Mandarin-English bilingual speech recognition system. Firstly a universal set of bilingual acoustic models based on a novel State-Time-Alignment (STA) method is proposed to balance the performance and the complexity of the bilingual speech recognition system. Then Discriminative training approaches such as discriminative Gaussian training using the minimum phone error (MPE) criterion and the discriminatively trained feature transform fMPE, which are proved to improved monolingual recognition performance, are modified to manage bilingual speech recognition system. A new method is applied to generate significantly better lattices for training the bilingual model, and complementary discriminative training methods are also explored to get the best ROVER performance in the bilingual situation. Experimental results show that the STA phone clustering method outperforms other existing phone clustering methods. Furthermore both forms of discriminative training reduce the word error rate of the multilingual system, and combining complementary discriminative training methods improves the performance significantly.

Full Text