Scale-aware dual-branch complex convolutional recurrent network for monaural speech enhancement

Yihao Li,Meng Sun,Xiongwei Zhang,Hugo Van Hamme

doi:10.1016/j.csl.2024.101618

Abstract

A key step to single channel speech enhancement is the orthogonal separation of speech and noise. In this paper, a dual branch complex convolutional recurrent network (DBCCRN) is proposed to separate the complex spectrograms of speech and noises simultaneously. To model both local and global information, we incorporate conformer modules into our network. The orthogonality of the outputs of the two branches can be improved by optimizing the Signal-to-Noise Ratio (SNR) related losses. However, we found the models trained by two existing versions of SI-SNRs will yield enhanced speech at a very different scale from that of its clean counterpart. SNR loss will lead to a shrink amplitude of enhanced speech as well. A solution to this problem is to simply normalize the output, but it only works for off-line processing, not for the streaming one. When streaming speech enhancement is required, the error scale will lead to the degradation of speech quality. From an analytical inspection of the weakness of the models trained by SNR and SI-SNR losses, a new loss function called scale-aware SNR (SA-SNR) is proposed to cope with the scale variations of the enhanced speech. SA-SNR improves over SI-SNR by introducing an extra regularization term that encourages the model to produce signals of similar scale as the input, which has little influence on the perceptual quality of the enhanced speech. In addition, the commonly used evaluation recipe for speech enhancement may not be sufficient to comprehensively reflect the performance of the speech enhancement methods using SI-SNR losses, where amplitude variations of input speech should be carefully considered. A new evaluation recipe called ScaleError is introduced. Experiments show that our proposed method improves over the existing baselines on the evaluation sets of the voice bank corpus, DEMAND and the Interspeech 2020 Deep Noise Suppression Challenge, by obtaining higher scores for PESQ, STOI, SSNR, CSIG, CBAK and COVL.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Scale-aware dual-branch complex convolutional recurrent network for monaural speech enhancement

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: Jun 1, 2024
Citations: 2

Similar Papers

Scale-aware dual-branch complex convolutional recurrent network for monaural speech enhancement
Yihao Li ... Hugo Van Hamme
Computer Speech & Language | VOL. 86
Yihao Li, et. al.Yihao Li ... Hugo Van Hamme
01 Jun 2024
Computer Speech & Language | VOL. 86

IKDSumm: Incorporating key-phrases into BERT for extractive disaster tweet summarization
Piyush Kumar Garg ... Sourav Kumar Dandapat
Computer Speech & Language | VOL. 87
Piyush Kumar Garg, et. al.Piyush Kumar Garg ... Sourav Kumar Dandapat
16 Apr 2024
Computer Speech & Language | VOL. 87

SEBGM: Sentence Embedding Based on Generation Model with multi-task learning
Qian Wang ... Xu Wang
Computer Speech & Language | VOL. 87
Qian Wang, et. al.Qian Wang ... Xu Wang
06 Apr 2024
Computer Speech & Language | VOL. 87

A flexible BERT model enabling width- and depth-dynamic inference
Ting Hu ... Haojin Yang
Computer Speech & Language | VOL. 87
Ting Hu, et. al.Ting Hu ... Haojin Yang
04 Apr 2024
Computer Speech & Language | VOL. 87

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scale-aware dual-branch complex convolutional recurrent network for monaural speech enhancement

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language