Abstract

In this paper, the use of discriminative linear transforms (DLT) is investigated to construct speaker adaptive speech recognition systems, where a discriminative criterion rather than ML is used for transform parameter estimation. The minimum phone error (MPE) criterion is investigated for DLT estimation, by making use of a so-called weak-sense auxiliary function to derive the estimation formulae. An implementation based on lattices is used for DLT statistics accumulation, where the use of a weakened language model allows more confusion data to be included. To improve DLT estimation for unsupervised adaptation, a method of incorporating word correctness information of the supervision into transform estimation is developed. The confidence scores calculated by confusion network decoding are used to represent the word correctness and weight the numerator statistics during DLT estimation. This makes the DLT estimation less sensitive to errors in the supervision. Experiments on transcription of read newspaper data and on conversational telephone speech transcription have shown the improvements of DLT over MLLR for both supervised and unsupervised adaptation, and the effectiveness of confidence scores for improving both normal and DLT-based MLLR adaptation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.