Abstract

The paper addresses the use of discriminative training criteria for speaker adaptive training (SAT), where both the transform generation and model parameter estimation are estimated using the minimum phone error (MPE) criterion. In a similar fashion to the use of I-smoothing for standard MPE training, a smoothing technique is introduced to avoid over-training when optimizing MPE-based feature-space transforms. Experiments on a conversational telephone speech (CTS) transcription task demonstrate that MPE-based SAT models can reduce the word error rate over non-SAT MPE models by 1.0% absolute, after lattice-based MLLR adaptation. Moreover, a simplified implementation of MPE-SAT with the use of constrained MLLR, in place of MPE-estimated transforms, is also discussed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call