Speaker independent telephone speech recognition

H Iizuka

doi:10.1109/icassp.1985.1168288

Abstract

This paper descrives recognition method, reference pattern generation method, and evaluation about the speaker independent recognition for telephone speech response systems. Input utterance is analyzed by 19 channel BPFs. The power and vocal cord source characteristics are normalized. The time normalization is realized by linearly compressing or expanding to 32 frames. The speech pattern undergoes pattern matching with male and female reference patterns, and the category of the nearest reference pattern is taken as the result. It is necessary to optimize the reference patterns so that the speech can be correctly recognized in spite of the difference of formant frequencies, and slight segmentation errors. To optimize the reference patterns, the recognition of the training patterns and updating of the reference patterns are repeated. A total of 256 male and female reference patterns were generated The speech recognition accuracy of this method in recognizing non-training voice data was 95.8% with automatic segmentation.

Full Text