Abstract

The performance level of speech recognizer drops significantly when there is an acoustic mismatch between training and operational environments. A speech recognizer is called robust if it preserves good recognition accuracy even in the mismatch conditions. Present study addresses the recognition of English speech in noisy environments and presents the comparative study of various frequency scales used in parameterization based on the average recognition rate. For the robust automatic speech reorganization, a front end signal enhancement component, spectral subtraction algorithm, is used to prefilter the noisy input speech prior fed to the recognizer. A number of frequency warped scales namely, perceptual scales viz, Mel scale, Bark scale, equivalent rectangular bandwidth rate scale, and a non-perceptual scale called uniform scale are used in the parameterization for feature extraction from enhanced speech. A suite of experiments is carried out to evaluate the performance of the speech recognizer, with and without the use of a front end signal enhancement component, in a variety of noisy environments. Recognition accuracy is tested in terms of word linguistic levels on a wide range of signal to noise ratios for both stationary and non-stationary noises.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call