A Nested U-Net With Self-Attention and Dense Connectivity for Monaural Speech Enhancement

Xiaoxiao Xiang,Xiaojuan Zhang,Haozhe Chen

doi:10.1109/lsp.2021.3128374

Xiaoxiao Xiang, Xiaojuan Zhang + Show 1 more

https://doi.org/10.1109/lsp.2021.3128374

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

With the development of deep neural networks, speech enhancement technology has been vastly improved. However, commonly used speech enhancement approaches cannot fully leverage contextual information from different scales, which limits performance improvement. To address this problem, we propose a nested U-Net with self-attention and dense connectivity (SADNUNet) for monaural speech enhancement in the time domain. SADNUNet is an encoder-decoder structure with skip connections. In SADNUNet, the multi-scale aggregation block is proposed to explore more contextual information from different scales. By this means, the advantage of global and local speech features can be fully utilized to improve speech reconstruction ability. Furthermore, dense connectivity and self-attention are incorporated in the network for better feature extraction and utterance level context aggregation. The experimental results demonstrate that the proposed approach achieves on-par or better performance than other models in objective speech intelligibility and quality scores.

Full Text