Abstract

In this study, we have proposed a novel dilated convolutional recurrent neural network for real-time monaural speech enhancement. Our proposed model incorporates dilated causal convolutions with a long short-term memory (LSTM) layer and skip connections to track a target speaker from a single channel noisy-reverberant mixture. Our model was evaluated in simulated rooms with different reverberation times and unseen background noises. Experimental results show significant improvements in objective speech intelligibility and speech quality of the enhanced speech using proposed model compared to LSTM, gated residual network (GRN) and convolutional recurrent network (CRN) models. Moreover, this model has better generalization to untrained speakers and unseen noises compared to LSTM, GRN and CRN, while it has fewer trainable parameters and can operate in real-time applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.