Abstract

Speaker-independent speech recognition experiments using an auditory model front end with a spectro-temporal masking model demonstrated the improvement of the recognition performance and outperformed the auditory front ends without the masking model and the traditional LPC-based front ends. The auditory model front end composed of an adaptive Q cochlear filter bank incorporating spectro-temporal masking has been proposed [J. Acoust. Soc. Am. 92, 2476 (A) (1992)]. The spectro-temporal masking model can enhance common phonetic features by eliminating the speaker-dependent spectral tilt that reflects individual source variation. It can also enhance the spectral dynamics that convey phonological information in speech signals. These advantages result in an effective new spectral parameter for representing speech models for speaker-independent speech recognition. Speaker-independent word and phoneme recognition experiments were carried out for Japanese word and phrase databases. The masked spectrum was calculated by subtracting the masking level from logarithmic power spectra extracted using a 64-channel adaptive Q cochlear filter bank. The masking levels were calculated as the weighted sum of the smoothed preceding spectra. To cover the variability of the time sequences of the spectrum, multi-template DTW and hidden Markov model were used as the backend recognition mechanism. a)Also at ATR Auditory and Visual Perception Res. Labs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.