Abstract
A key step to single channel speech enhancement is the orthogonal separation of speech and noise. In this paper, a dual branch complex convolutional recurrent network (DBCCRN) is proposed to separate the complex spectrograms of speech and noises simultaneously. To model both local and global information, we incorporate conformer modules into our network. The orthogonality of the outputs of the two branches can be improved by optimizing the Signal-to-Noise Ratio (SNR) related losses. However, we found the models trained by two existing versions of SI-SNRs will yield enhanced speech at a very different scale from that of its clean counterpart. SNR loss will lead to a shrink amplitude of enhanced speech as well. A solution to this problem is to simply normalize the output, but it only works for off-line processing, not for the streaming one. When streaming speech enhancement is required, the error scale will lead to the degradation of speech quality. From an analytical inspection of the weakness of the models trained by SNR and SI-SNR losses, a new loss function called scale-aware SNR (SA-SNR) is proposed to cope with the scale variations of the enhanced speech. SA-SNR improves over SI-SNR by introducing an extra regularization term that encourages the model to produce signals of similar scale as the input, which has little influence on the perceptual quality of the enhanced speech. In addition, the commonly used evaluation recipe for speech enhancement may not be sufficient to comprehensively reflect the performance of the speech enhancement methods using SI-SNR losses, where amplitude variations of input speech should be carefully considered. A new evaluation recipe called ScaleError is introduced. Experiments show that our proposed method improves over the existing baselines on the evaluation sets of the voice bank corpus, DEMAND and the Interspeech 2020 Deep Noise Suppression Challenge, by obtaining higher scores for PESQ, STOI, SSNR, CSIG, CBAK and COVL.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.