Speaker-dependent 100 word recognition using dynamic spectral features of speech and neural networks

T Kitamura,K Nishioka,A Ito,E Hayahara

doi:10.1109/mwscas.1991.252106

Abstract

A spoken word recognition method using dynamic features of speech and neural networks is presented. Dynamic features of speech are obtained from a two-dimensional mel-cepstrum (TDMC). The TDMC is defined as the two-dimensional Fourier transform of mel-frequency scaled log spectra in the frequency and time domains. It has averaged spectral features, dynamic spectral features, and averaged and dynamic features of power of the two-dimensional mel-log spectra in the analyzed interval. The neural network in this study is a three-layered feedforward neural network and learns automatically using a back-propagation algorithm. Dynamic spectral features, and averaged and dynamic features of power are used as the input of a neural network. The experimental results of speaker-dependent word recognition experiments for 100 Japanese city names uttered by nine speakers show that dynamic spectral features smoothed with respect to time are effective, and a recognition accuracy of 99.1% was obtained. >

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Speaker-dependent 100 word recognition using dynamic spectral features of speech and neural networks

Abstract

Talk to us