Abstract

Automatic multilingual speech recognition is always a difficult task. This paper presents recent work on the development of a Mandarin-English bilingual speech recognition system. Firstly a universal set of bilingual acoustic models based on a novel State-Time-Alignment (STA) method is proposed to balance the performance and the complexity of the bilingual speech recognition system. Then Discriminative training approaches such as discriminative Gaussian training using the minimum phone error (MPE) criterion and the discriminatively trained feature transform fMPE, which are proved to improved monolingual recognition performance, are modified to manage bilingual speech recognition system. A new method is applied to generate significantly better lattices for training the bilingual model, and complementary discriminative training methods are also explored to get the best ROVER performance in the bilingual situation. Experimental results show that the STA phone clustering method outperforms other existing phone clustering methods. Furthermore both forms of discriminative training reduce the word error rate of the multilingual system, and combining complementary discriminative training methods improves the performance significantly.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.