DPHT-ANet: Dual-path high-order transformer-style fully attentional network for monaural speech enhancement

Nasir Saleem,Sami Bourouis,Hela Elmannai,Abeer D Algarni

doi:10.1016/j.apacoust.2024.110131

Abstract

Dual-path Transformer-style models have demonstrated significant effectiveness in speech enhancement. However, extensive parameterization and computational complexity present challenges for practical applications. This study presents an encoder-decoder-based dual-path high-order transformer-style fully-attentional network (DPHT-ANet) to address the speech enhancement problem with a smaller parameter size and reduced computational complexity. The DPHT-ANet incorporates a high-order information interaction module and replaces the multi-head attention module with a recursive gated convolution (GnConv). This enables the DPHT-ANet to effectively capture deep-level information across time and frequency dimensions, improving its ability to capture complex temporal and spectral patterns. Furthermore, DPHT-ANet uses a unified activation and attention mechanism in the convolutional encoder-decoder layers, resulting in a fully attentional network that prioritizes relevant high-level features at earlier stages. The DPHT-ANet uses interactive feature learning and fusion of varying lengths and dimensions with pre-trained features from a large-scale dataset to further enhance its robustness. Experimental results on the VCTK+DEMAND and WSJ0-SI84 datasets demonstrate the effectiveness of the proposed approach. On the WSJ0-SI84 dataset, the DPHT-ANet significantly improves ESTOI (38.28%), PESQ (1.21), and SDR (10.83 dB) over the noisy mixture. Similarly, on the VCTK+DEMAND, the DPHT-ANet improves STOI (3.50%), PESQ (1.22), and SegSNR (9.93 dB) over the noisy mixture, showcasing superior performance in speech enhancement.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DPHT-ANet: Dual-path high-order transformer-style fully attentional network for monaural speech enhancement

Abstract

Talk to us

Similar Papers

More From: Applied Acoustics

Lead the way for us

Similar Papers

Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses
Shengkui Zhao ... Bin Ma
-
Shengkui Zhao, et. al.Shengkui Zhao ... Bin Ma
06 Jun 2021
06 Jun 2021

Adaptive selection of local and non-local attention mechanisms for speech enhancement
Xinmeng Xu ... Yuhong Yang
Neural Networks | VOL. 174
Xinmeng Xu, et. al.Xinmeng Xu ... Yuhong Yang
13 Mar 2024
Neural Networks | VOL. 174

End-to-End Speech Enhancement Using Fully Convolutional Networks with Skip Connections
Dujuan Wang ... Changchun Bao
-
Dujuan Wang, et. al.Dujuan Wang ... Changchun Bao
01 Nov 2019
01 Nov 2019

Dynamic controllable speech enhancement models based on quantile loss functions
Wenhao Yuan ... Yuepeng Zhang
Applied Acoustics | VOL. 215
Wenhao Yuan, et. al.Wenhao Yuan ... Yuepeng Zhang
09 Nov 2023
Applied Acoustics | VOL. 215

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DPHT-ANet: Dual-path high-order transformer-style fully attentional network for monaural speech enhancement

Abstract

Talk to us

Similar Papers

More From: Applied Acoustics