Abstract

Noise robustness has long been one of the most important goals in speech recognition. While the performance of automatic speech recognition (ASR) deteriorates in noisy situations, the human auditory system is relatively adept at handling noise. To mimic this adeptness, we study and apply psychoacoustic models in speech recognition as a means to improve robustness of ASR systems. Psychoacoustic models are usually implemented in a subtractive manner with the intention to remove noise. However, this is not necessarily the only approach to this challenge. This paper presents a novel algorithm which implements psychoacoustic models additively. The algorithm is motivated by the fact that weak sound elements that are below the masking threshold are the same for the human auditory system, regardless of the actual sound pressure level. Another important contribution of our proposed algorithm is a superior implementation of masking effect. Only those sounds that fall below the masking threshold are modified, which better reflects physical masking effects. We give detailed experimental results showing relationships between the subtractive and additive approaches. Since all the parameters of the proposed filters are positive or zero, they are named 2D psychoacoustic P-filters. Detailed theoretical analysis is provided to show the noise removal ability of these filters. Experiments are carried out on the AURORA2 database. Experimental results show that the word recognition rate using our proposed feature extraction method has been effectively increased. Given models trained with clean speech, our proposed method achieves up to 84.23% word recognition on noisy data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call