Abstract
In order to achieve high-precision speech recognition in real environments phone model adaptation procedures that can rapidly account for a wide range of different speakers and acoustic noise conditions are required. In this paper we propose an unsupervised speaker adaptation method that extends an unsupervised speaker and environment adaptation method based on sufficient statistics from HMMs by performing spectral subtraction and then adding a known noise to the input. Existing methods assume that a model is trained to match each of the different types of background noise that will be the object of recognition and do not consider variations in the signal-to-noise ratio or changes in the background noise for given inputs. In contrast, our method constrains the noise of the input data using an estimation of the noise spectra and then adds a known stable noise to the bleached noise that remains in the input, thereby smoothing out differences between background noises and enabling us to perform recognition with a single set of acoustic models. In addition, with regard to speaker adaptation, we select the set of closest speakers from our database on the basis of a single arbitrary utterance from the test speaker and retrain the acoustic models using the sufficient statistics of those speakers. By combining these two methods we are able to rapidly and accurately adapt to a new speaker. In recognition experiments with a signal-to-noise ratio of 20 dB and in a variety of noise conditions, the proposed method resulted in a recognition rate of 2 percent more than a speaker-independent model matched to the test noise environment for each noise environment, achieving an average recognition performance of 85.1 percent overall. In addition, we conducted a comparison of our method with a standard supervised adaptation technique: maximum likelihood linear regression (MLLR). © 2005 Wiley Periodicals, Inc. Electron Comm Jpn Pt 2, 88(8): 30–41, 2005; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjb.20199
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.