Abstract

AbstractWidespread use of large‐vocabulary continuous speech recognition systems has recently occurred, encouraging the application of speech recognition techniques to various problems. One of the factors that adversely affect the performance of speech recognition systems is a mismatch between the acoustic properties of the speech of the system user and the acoustic model. The speech of young or middle‐aged adults is generally used in constructing the acoustic model. Thus, a mismatch occurs between the model and the acoustic properties of the speech of the elderly, which may degrade the recognition rate. In this study, a large‐scale elderly speech database (200 sentences ×301 subjects) is used to train the acoustic model, and the resulting elderly acoustic model is evaluated by using a large‐vocabulary continuous speech recognition system. In the experiments, the word recognition rate was improved by 3 to 5% compared to the recognition results of an acoustic model trained by young or middle‐aged adult speech, namely, by the JNAS speech database (150 sentences ×260 subjects, average 28.6 years). It is also verified experimentally that the recognition rate is further improved in speaker adaptation to elderly speech by making use of an acoustic model trained by elderly speech. © 2004 Wiley Periodicals, Inc. Electron Comm Jpn Pt 2, 87(7): 49–57, 2004; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjb.20101

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call