A Hybrid Approach for Deep Noise Suppression Using Deep Neural Networks

Mohit Bansal,Arnold Sachith A Hans,Vikram Lakkavalli,Smitha Rao

doi:10.1007/978-981-16-9605-3_5

Abstract

AbstractReducing noise to generate a clean speech in stationary and non-stationary noise conditions, or denoising, is one of the challenging tasks in the areas of speech enhancement for single channel data. Traditional methods depend upon first-order statistics, deep learning models, through their power of multiple nonlinear transformations can yield better results compared to traditional approaches for reducing stationary and non-stationary noise in speech. To denoise a speech signal, we propose a deep learning approach called UNet with BiLSTM network (bi directional long short-term memory) to enhance speech. A subset of LibriSpeech speech dataset is used to create training set by using both stationary noise and non-stationary noise with different SNR ratios. The results were evaluated using PESQ (perceptual evaluation of speech quality) and STOI (short-term objective intelligibility) speech evaluation metrics. We show through experiments that the proposed method shows better denoising metrics for both stationary and non-stationary conditions.KeywordsConvolution neural network (CNN)Long short-term memory (LSTM)UNetPerceptual evaluation of speech quality (PESQ)Short time objective intelligibility (STOI)White noiseUrban noiseStationary noiseNon-stationary noise

Full Text