Joint waveform and magnitude processing for monaural speech enhancement

Xiaoxiao Xiang,Xiaojuan Zhang

doi:10.1016/j.apacoust.2022.109077

Abstract

Speech enhancement is an essential task for improving the quality and intelligibility of speech signals corrupted by noise. Current deep neural network-based speech enhancement methods have achieved remarkable results. However, most of them only work in the time domain or the time–frequency domain, which do not fully use the complementary advantages of the two domains. In this paper, we propose a framework with joint waveform and magnitude processing for single-channel speech enhancement, which can realize the complementary advantages of the time-domain and time–frequency features. Specifically, the proposed network adopts a triple-stage training strategy. In the first two stages, the two sub-networks take waveform and magnitude as input features to generate two pre-enhanced speech signals, respectively. In the third stage, a fusion sub-network is used to fuse the two pre-enhanced speech signals, which further improves the quality and intelligibility of speech. All three sub-networks are encoder-decoder-based architecture with skip connections, and an additional temporal convolutional network is inserted between the encoder and the decoder. In order to improve the information flow of the network, we introduce the gating mechanism into the temporal convolutional network, which we refer to as the gated temporal convolutional network. In addition, the recently popular group communication strategy is also introduced into the network, which significantly reduces the number of trainable parameters and obtains on-par or better enhancement performance. Experimental results demonstrate that the proposed method consistently outperforms other advanced baselines in terms of objective speech quality and intelligibility metrics. Moreover, the proposed model also achieves outstanding cross-corpus and cross-language generalization capabilities.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Joint waveform and magnitude processing for monaural speech enhancement

Abstract

Talk to us

Similar Papers

More From: Applied Acoustics

Lead the way for us

Journal: Applied Acoustics	Publication Date: Oct 26, 2022
Citations: 5

Similar Papers

Reverberation time and maximum background-noise level for classrooms from a comparative study of speech intelligibility metrics.
Sylvio R Bistafa ... John S Bradley
The Journal of the Acoustical Society of America | VOL. 107
Sylvio R Bistafa, et. al.Sylvio R Bistafa ... John S Bradley
01 Feb 2000
The Journal of the Acoustical Society of America | VOL. 107

Stacked Multiscale Densely Connected Temporal Convolutional Attention Network for Multi-Objective Speech Enhancement in an Airborne Environment
Ping Huang ... Yafeng Wu
Aerospace | VOL. 11
Ping Huang, et. al.Ping Huang ... Yafeng Wu
15 Feb 2024
Aerospace | VOL. 11

Kalman Filtering with Machine Learning Methods for Speech Enhancement

-

04 May 2021
04 May 2021

Masked multi-head self-attention for causal speech enhancement
Aaron Nicolson ... Kuldip K Paliwal
Speech Communication | VOL. 125
Aaron Nicolson, et. al.Aaron Nicolson ... Kuldip K Paliwal
29 Oct 2020
Speech Communication | VOL. 125

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Joint waveform and magnitude processing for monaural speech enhancement

Abstract

Talk to us

Similar Papers

More From: Applied Acoustics