Abstract
This paper describes a new speech enhancement approach using perceptually based noise reduction. The proposed approach is based on the application of two perceptual filtering models to noisy speech signals: the gammatone and the gammachirp filter banks with nonlinear resolution according to the equivalent rectangular bandwidth (ERB) scale. The perceptual filtering gives a number of subbands that are individually spectral weighted and modified according to two different noise suppression rules. The importance of an accurate noise estimate is related to the reduction of the musical noise artifacts in the processed speech that appears after classic subtractive process. In this context, we use continuous noise estimation algorithms. The performance of the proposed approach is evaluated on speech signals corrupted by real‐world noises. Using objective tests based on the perceptual quality PESQ score and the quality rating of signal distortion (SIG), noise distortion (BAK) and overall quality (OVRL), and subjective test based on the quality rating of automatic speech recognition (ASR), we demonstrate that our speech enhancement approach using filter banks modeling the human auditory system outperforms the conventional spectral modification algorithms to improve quality and intelligibility of the enhanced speech signal.
Highlights
The high quality sound of talking speech in real environment is very important for automatic speech processing systems and human- machine interfaces
Using objective tests based on the perceptual quality PESQ score and the quality rating of signal distortion (SIG), noise distortion (BAK) and overall quality (OVRL), and subjective test based on the quality rating of automatic speech recognition (ASR), we demonstrate that our speech enhancement approach using filter banks modeling the human auditory system outperforms the conventional spectral modification algorithms to improve quality and intelligibility of the enhanced speech signal
To cover the frequency range of the signal, the analysis stage used in the multibands subtraction consists of 27-4th order gammatone/gammachirp filter banks according to the equivalent rectangular bandwidth (ERB) scale
Summary
The high quality sound of talking speech in real environment is very important for automatic speech processing systems and human- machine interfaces. Many methods are developed in order to remove the background noise while retaining speech intelligibility based on short time spectral estimation of the clean speech These methods are able to reduce the noise and improve the quality, but at the expanse of introducing speech distortion which results in loss of intelligibility. It is proposed to adapt the spectral modification algorithms to a multibands analysis using human perceptual filter banks models according critical band concept and nonlinear frequency resolution This allows to find the best tradeoff between the amount of noise reduction, the speech distortion and the level of musical noise in a perceptual view, and to overcome the limitation of spectral modification algorithms for speech enhancement in real-world listening situation where the background noise level and characteristics are constantly changing.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have