Abstract

This paper proposes an instantaneous speaker adaptation method that uses N-best decoding for continuous mixture-density hidden-Markov-model-based speech-recognition systems. This method is effective even for speakers whose decoding using speaker-independent (SI) models are error-prone and for whom speaker adaptation techniques are truly needed. In addition, smoothed estimation and utterance verification are introduced into this method. The smoothed estimation is based on the likelihood values for adapted models of word sequences obtained by N-best decoding and improves the performance of error-prone speakers, and the utterance verification technique reduces the amount of calculation required. Performance evaluation using connected-digit (four-digit strings) recognition experiments performed over actual telephone lines showed a reduction of 36·4% in the error rates of speakers whose decoding using SI models are error-prone.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.