Single-channel Speech Enhancement Using Multi-Task Learning and Attention Mechanism

Jingyu Hou,Shenghui Zhao,Yubo An

doi:10.1109/icsip52628.2021.9688330

Abstract

Major breakthroughs have been made in speech enhancement with the introduction of deep learning. However, the noise reduction performance under the lower signal-to-noise ratio (SNR) conditions and the noise generalization ability of the model are still to be improved. To counter these issues, a multi-task convolutional recurrent network (MT-CRN) is proposed and applied to single-channel speech enhancement. The MT-CRN aims to estimate the magnitude spectrum of both the clean speech and the noise from the noisy speech. Besides, a weighted complementary loss function is constructed to further improve the effectiveness of the multi-task training, and a time-frequency attention mechanism is employed to capture the key information of each task. The experimental results show that the proposed MT-CRN obviously outperforms the baselines at the lower SNR levels with a high parameter efficiency, and achieves a stronger noise generalization performance.

Full Text