Abstract

AbstractReducing noise to generate a clean speech in stationary and non-stationary noise conditions, or denoising, is one of the challenging tasks in the areas of speech enhancement for single channel data. Traditional methods depend upon first-order statistics, deep learning models, through their power of multiple nonlinear transformations can yield better results compared to traditional approaches for reducing stationary and non-stationary noise in speech. To denoise a speech signal, we propose a deep learning approach called UNet with BiLSTM network (bi directional long short-term memory) to enhance speech. A subset of LibriSpeech speech dataset is used to create training set by using both stationary noise and non-stationary noise with different SNR ratios. The results were evaluated using PESQ (perceptual evaluation of speech quality) and STOI (short-term objective intelligibility) speech evaluation metrics. We show through experiments that the proposed method shows better denoising metrics for both stationary and non-stationary conditions.KeywordsConvolution neural network (CNN)Long short-term memory (LSTM)UNetPerceptual evaluation of speech quality (PESQ)Short time objective intelligibility (STOI)White noiseUrban noiseStationary noiseNon-stationary noise

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call