Speaker independen telephone speech recognition and reference pattern generation.

Hiroshi Iizuka,Makoto Morito,Kozo Yamada

doi:10.1250/ast.7.155

Hiroshi Iizuka, Makoto Morito + Show 1 more

Open Access

https://doi.org/10.1250/ast.7.155

Copy DOI

Abstract

This paper describes the speaker independent isolated word speech recognition method developed for telephone speech response systems. To recognize speech, input utterances are first frequency analyzed by 19 channel BPFs. The frame cycle used is 8ms. Then the analyzed data undergo logarithmic conversion, normalization of voice chords sound source characteristics by least squares approximation line and time normalization by linear companding to 32 frames. The speech patterns thus obtained undergo pattern matching with multiple reference patterns generated separately for male and female speakers in advance. In applying this recognition method, it is necessary to optimize the reference patterns so that the speech can be correctly recognized in spite of the difference of formant frequencies, the differences in individual speaker's habits, the variations of phonetic positions, non-vocalization, and slight segmentation errors. To evaluate the performance of this recognition method, voices of about 2, 000 persons were recorded through long distance telephone lines. A 16 Japanese words vocabulary was used. A total of 256 male and female reference patterns were generated using the training voice data of about 570 persons. The speech recognition accuracy of this method in recognizing non-training voice data was 97.8 %.

Full Text