Abstract

With the development of deep neural networks, speech enhancement technology has been vastly improved. However, commonly used speech enhancement approaches cannot fully leverage contextual information from different scales, which limits performance improvement. To address this problem, we propose a nested U-Net with self-attention and dense connectivity (SADNUNet) for monaural speech enhancement in the time domain. SADNUNet is an encoder-decoder structure with skip connections. In SADNUNet, the multi-scale aggregation block is proposed to explore more contextual information from different scales. By this means, the advantage of global and local speech features can be fully utilized to improve speech reconstruction ability. Furthermore, dense connectivity and self-attention are incorporated in the network for better feature extraction and utterance level context aggregation. The experimental results demonstrate that the proposed approach achieves on-par or better performance than other models in objective speech intelligibility and quality scores.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call