Cross-Probability Model Based on Gmm for Feature Vector Normalization

Luis Buera,Eduardo Lleida,Óscar Saz,Antonio Miguel,Alfonso Ortega

doi:10.1007/978-0-387-79582-9_14

Abstract

In order to develop a robust man–machine interface based on speech for cars, multi-environment model-based linear normalization, MEMLIN, was presented earlier and it was proved to be effective to compensate environment mismatch. MEMLIN is an empirical feature vector normalization technique which models clean and noisy spaces with Gaussian mixture models, GMMs; and the probability of the clean model Gaussian, given the noisy model one and the noisy feature vector (cross-probability model), is a critical point. In previous works the cross-probability model was approximated as time independent in a training process. However, in this chapter, an estimation based on GMM is considered for MEMLIN. Some experiments with SpeechDat Car and Aurora2 databases were carried out in order to study the performance of the proposed estimation of the cross-probability model, obtaining important improvements: 75.53 and 62.49% of mean improvement in word error rate, WER, for MEMLIN with SpeechDat Car and a reduced set of Aurora2 database, respectively (82.86 and 67.52% if time-independent cross-probability model is applied). Although the behaviour of the technique is satisfactory, using clean acoustic models in decoding produces a mismatch because the normalization is not perfect. So, retraining acoustic models in the normalized space is proposed, reaching 97.27% of mean improvement with SpeechDat Car database.

Full Text