Abstract

A spoken word recognition method using dynamic features of speech and neural networks is presented. Dynamic features of speech are obtained from a two-dimensional mel-cepstrum (TDMC). The TDMC is defined as the two-dimensional Fourier transform of mel-frequency scaled log spectra in the frequency and time domains. It has averaged spectral features, dynamic spectral features, and averaged and dynamic features of power of the two-dimensional mel-log spectra in the analyzed interval. The neural network in this study is a three-layered feedforward neural network and learns automatically using a back-propagation algorithm. Dynamic spectral features, and averaged and dynamic features of power are used as the input of a neural network. The experimental results of speaker-dependent word recognition experiments for 100 Japanese city names uttered by nine speakers show that dynamic spectral features smoothed with respect to time are effective, and a recognition accuracy of 99.1% was obtained. >

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.