Acoustic model adaptation by selective training using two‐stage clustering

Shoei Sato,Kazuo Onoe,Hiroyuki Segi,Haruo Isono,Toru Imai,Eiichi Miyasaka,Akio Ando

doi:10.1002/ecjc.20157

Abstract

AbstractIn speech recognition systems where the speaker and utterance environment cannot be designated, the drop in recognition precision due to the incompatibility of the input speech and acoustic model's training data is a problem. Although this problem is normally solved by speaker adaptation, sufficient precision cannot be achieved for speaker adaptation unless good‐quality adaptation data can be obtained. In this paper, the authors propose a method of efficiently clustering large‐scale data using the likelihoods of a cluster model that was created from small‐scale data as the criteria to obtain a high‐precision adapted acoustic model. They also propose a method of using the cluster model to automatically determine the adapted acoustic model during recognition from only the beginning of the sentences of the input speech. The results of applying the proposed technique to news speech recognition experiments show that the adapted acoustic model selection precision can be ensured by using only 0.5 second of data of the beginnings of sentences of the input speech and that the proposed technique achieves a reduction rate for invalid recognitions of 20% and a reduction in the time required for recognition of 23% compared with when the adapted acoustic model for each cluster is not used. © 2004 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 88(2): 41–51, 2005; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjc.20157

Full Text