Abstract

The performance of large-vocabulary automatic speech recognition (ASR) systems deteriorates severely in mismatched training and testing conditions. Signal processing techniques based on the human auditory system have been proposed to improve ASR performance, especially under adverse acoustic conditions. The paper compares one such scheme, the ensemble interval histogram (EIH), with the conventional mel cepstral analysis (MEL). These two spectral feature extraction methods were implemented as front ends to a state-of-the-art continuous speech recognizer and evaluated on the TIMIT database (male). To characterize the influence of signal distortion on the representation of different sounds, phone classification experiments were conducted for three acoustic conditions-clean speech, speech through a telephone channel and speech under room reverberations (the last two are simulations). Classification was performed for static features alone and for static and dynamic features, to observe the relative contribution of time derivatives. The performance is displayed as percentage of phones correctly classified. Confusion matrices were also derived from phone classification to provide diagnostic information.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call