Abstract

AbstractWe propose an unsupervised method for the training of phoneme models using sufficient statistics from hidden Markov models and a speaker distance function. Our proposed method uses sufficient statistics calculated for acoustic models of speakers that are acoustically close to a test speaker to construct an accurately adapted model. The method involves (1) the selection of a group of speakers that are acoustically close to the test speaker and (2) the creation of phoneme models that are adapted to the test speaker using sufficient statistics from the models of the selected group of speakers. The calculation of sufficient statistics is performed prior to adaptation off‐line. The proposed method results in high recognition rates by performing adaptation based on the calculations of sufficient statistics of models of speakers that are acoustically close to the test speaker. In addition, only a small amount of adaptation data is required. The adaptation is performed rapidly since the sufficient statistics can be calculated off‐line. In comparison with speaker clustering methods, our proposed method allows for a more appropriate determination of the speaker cluster since this is selected dynamically on‐line using data from the test speaker. We show that in recognition experiments with small amounts of adaptation data, our proposed method results in a higher recognition rate than MLLR. © 2005 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 88(9): 33–41, 2005; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjc.20188

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call