Abstract

In this study, we have proposed a novel dilated convolutional recurrent neural network for real-time monaural speech enhancement. Our proposed model incorporates dilated causal convolutions with a long short-term memory (LSTM) layer and skip connections to track a target speaker from a single channel noisy-reverberant mixture. Our model was evaluated in simulated rooms with different reverberation times and unseen background noises. Experimental results show significant improvements in objective speech intelligibility and speech quality of the enhanced speech using proposed model compared to LSTM, gated residual network (GRN) and convolutional recurrent network (CRN) models. Moreover, this model has better generalization to untrained speakers and unseen noises compared to LSTM, GRN and CRN, while it has fewer trainable parameters and can operate in real-time applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call