N-Best-based unsupervised speaker adaptation for speech recognition

Tomoko Matsui,Sadaoki Furui

doi:10.1006/csla.1997.0036

Abstract

This paper proposes an instantaneous speaker adaptation method that uses N-best decoding for continuous mixture-density hidden-Markov-model-based speech-recognition systems. This method is effective even for speakers whose decoding using speaker-independent (SI) models are error-prone and for whom speaker adaptation techniques are truly needed. In addition, smoothed estimation and utterance verification are introduced into this method. The smoothed estimation is based on the likelihood values for adapted models of word sequences obtained by N-best decoding and improves the performance of error-prone speakers, and the utterance verification technique reduces the amount of calculation required. Performance evaluation using connected-digit (four-digit strings) recognition experiments performed over actual telephone lines showed a reduction of 36·4% in the error rates of speakers whose decoding using SI models are error-prone.

Full Text