Abstract

This paper describes the speaker independent isolated word speech recognition method developed for telephone speech response systems. To recognize speech, input utterances are first frequency analyzed by 19 channel BPFs. The frame cycle used is 8ms. Then the analyzed data undergo logarithmic conversion, normalization of voice chords sound source characteristics by least squares approximation line and time normalization by linear companding to 32 frames. The speech patterns thus obtained undergo pattern matching with multiple reference patterns generated separately for male and female speakers in advance. In applying this recognition method, it is necessary to optimize the reference patterns so that the speech can be correctly recognized in spite of the difference of formant frequencies, the differences in individual speaker's habits, the variations of phonetic positions, non-vocalization, and slight segmentation errors. To evaluate the performance of this recognition method, voices of about 2, 000 persons were recorded through long distance telephone lines. A 16 Japanese words vocabulary was used. A total of 256 male and female reference patterns were generated using the training voice data of about 570 persons. The speech recognition accuracy of this method in recognizing non-training voice data was 97.8 %.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.