Unsupervised speaker adaptation for robust speech recognition in real environments

Shingo Yamade,Shinichi Yoshikawa,Akira Baba,Kiyohiro Shikano,Akinobu Lee,Hiroshi Saruwatari

doi:10.1002/ecjb.20199

Abstract

In order to achieve high-precision speech recognition in real environments phone model adaptation procedures that can rapidly account for a wide range of different speakers and acoustic noise conditions are required. In this paper we propose an unsupervised speaker adaptation method that extends an unsupervised speaker and environment adaptation method based on sufficient statistics from HMMs by performing spectral subtraction and then adding a known noise to the input. Existing methods assume that a model is trained to match each of the different types of background noise that will be the object of recognition and do not consider variations in the signal-to-noise ratio or changes in the background noise for given inputs. In contrast, our method constrains the noise of the input data using an estimation of the noise spectra and then adds a known stable noise to the bleached noise that remains in the input, thereby smoothing out differences between background noises and enabling us to perform recognition with a single set of acoustic models. In addition, with regard to speaker adaptation, we select the set of closest speakers from our database on the basis of a single arbitrary utterance from the test speaker and retrain the acoustic models using the sufficient statistics of those speakers. By combining these two methods we are able to rapidly and accurately adapt to a new speaker. In recognition experiments with a signal-to-noise ratio of 20 dB and in a variety of noise conditions, the proposed method resulted in a recognition rate of 2 percent more than a speaker-independent model matched to the test noise environment for each noise environment, achieving an average recognition performance of 85.1 percent overall. In addition, we conducted a comparison of our method with a standard supervised adaptation technique: maximum likelihood linear regression (MLLR). © 2005 Wiley Periodicals, Inc. Electron Comm Jpn Pt 2, 88(8): 30–41, 2005; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjb.20199

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Unsupervised speaker adaptation for robust speech recognition in real environments

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Unsupervised Intralingual and Cross-Lingual Speaker Adaptation for HMM-Based Speech Synthesis Using Two-Pass Decision Tree Construction
Matthew Gibson ... William Byrne
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 19
Matthew Gibson, et. al.Matthew Gibson ... William Byrne
01 May 2011
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 19

Speaker adaptation for telephony data using speaker clustering
Cheng Wu ... Zhong-Hua Wang
-
Cheng Wu, et. al. Cheng Wu ... Zhong-Hua Wang
21 Aug 2000
21 Aug 2000

Application of SVM-based correctness predictions to unsupervised discriminative speaker adaptation
Matthew Gibson ... Thomas Hain
-
Matthew Gibson, et. al.Matthew Gibson ... Thomas Hain
01 Mar 2012
01 Mar 2012

All-phoneme ergodic hidden Markov network for unsupervised speaker adaptation
Y Miyazawa ... J.-I Takami
-
Y Miyazawa, et. al.Y Miyazawa ... J.-I Takami
19 Apr 1994
19 Apr 1994

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unsupervised speaker adaptation for robust speech recognition in real environments

Abstract

Talk to us

Similar Papers