Abstract

AbstractThis paper discusses the speech recognition based on the time course of the local peak of the spectrum such as the formant, which has been considered important in the phoneme perception. A measure for the dynamical behavior of the spectrum is proposed based on the functional model of the FM‐neuron which is shown to exist in auditory physiology.First, the FM‐neuron is modeled as a time‐frequency filter for the spectral time‐series which responds only to the shift of the local peak frequency of the spectrum with the discrimination function for the shift direction. Then the measure to represent the difference of the output from the FM‐neuron model is derived based on the cepstral expansion of the spectrum. The measure is called the spectral movement similarity.It is shown that the spectral movement similarity on the auditory nonlinear frequency axis can be realized equivalently by the frequency weighting. A spoken word recognition experiment is conducted employing the dynamic time warping (DTW) using the spectral movement similarity. It is shown also that the recognition error is reduced greatly by combining the proposed measure and the traditional spectral distance compared to the case where only the traditional spectral distance is used. This improvement is more remarkable when the cepstral distance is used as the spectral distance, with the recognition error being reduced to one‐fourth.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.