Abstract

AbstractIn the use of speech recognition systems in a real environment, it is inevitable that surrounding environmental noise is present in the input speech, which degrades recognition performance. It is difficult in most cases to predict the mixing of the noise, and the discrepancy of noise environments between the input signal and the acoustic model is a reason for degradation of recognition performance. Consequently, it is desirable to construct an acoustic model which is robust to the mixing of various kinds of noise. The problem of noise mixture can be divided into two aspects, namely, diversified kinds of noise and diversified values of the SNR. In this paper, HMM composition using weight adaptation of the noise GMM is applied to the first problem, and the multi‐SNR path model is applied to the second problem. Performance evaluation is performed for a combination of these two approaches in a speech recognition experiment in a noisy environment, using the travel conversation task and the AURORA2 task. When 1 second of adaptation data is used in the AURORA2 task for SNR = 5 dB, the recognition rate is improved by 53% compared to the baseline HMM. This corresponds to the case in which 10 seconds of adaptation data is used in conventional HMM composition. © 2004 Wiley Periodicals, Inc. Electron Comm Jpn Pt 2, 87(6): 39–48, 2004; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjb.20093

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call