End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features

Cunhang Fan,Jiangyan Yi,Zhengqi Wen,Bin Liu,Xuefei Liu,Jianhua Tao

doi:10.1109/taslp.2020.2982029

Abstract

In this article, we propose an end-to-end post-filter method with deep attention fusion features for monaural speaker-independent speech separation. At first, a time-frequency domain speech separation method is applied as the pre-separation stage. The aim of pre-separation stage is to separate the mixture preliminarily. Although this stage can separate the mixture, it still contains the residual interference. In order to enhance the pre-separated speech and improve the separation performance further, the end-to-end post-filter (E2EPF) with deep attention fusion features is proposed. The E2EPF can make full use of the prior knowledge of the pre-separated speech, which contributes to speech separation. It is a fully convolutional speech separation network and uses the waveform as the input features. Firstly, the 1-D convolutional layer is utilized to extract the deep representation features for the mixture and pre-separated signals in the time domain. Secondly, to pay more attention to the outputs of the pre-separation stage, an attention module is applied to acquire deep attention fusion features, which are extracted by computing the similarity between the mixture and the pre-separated speech. These deep attention fusion features are conducive to reduce the interference and enhance the pre-separated speech. Finally, these features are sent to the post-filter to estimate each target signals. Experimental results on the WSJ0-2mix dataset show that the proposed method outperforms the state-of-the-art speech separation method. Compared with the pre-separation method, our proposed method can acquire 64.1%, 60.2%, 25.6% and 7.5% relative improvements in scale-invariant source-to-noise ratio (SI-SNR), the signal-to-distortion ratio (SDR), the perceptual evaluation of speech quality (PESQ) and the short-time objective intelligibility (STOI) measures, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2020
Citations: 62

Similar Papers

Training Supervised Speech Separation System to Improve STOI and PESQ Directly
Hui Zhang ... Xueliang Zhang
-
Hui Zhang, et. al.Hui Zhang ... Xueliang Zhang
01 Apr 2018
01 Apr 2018

End-to-End Monaural Speech Separation with a Deep Complex U-Shaped Network
Wen Zhang ... Kaijun Ren
Journal of Circuits, Systems and Computers | VOL. 31
Wen Zhang, et. al.Wen Zhang ... Kaijun Ren
26 Aug 2021
Journal of Circuits, Systems and Computers | VOL. 31

A speech separation algorithm based on the comb-filter effect
Tao Zhang ... Lingguo Kong
Applied Acoustics | VOL. 203
Tao Zhang, et. al.Tao Zhang ... Lingguo Kong
05 Jan 2023
Applied Acoustics | VOL. 203

Towards real-world objective speech quality and intelligibility assessment using speech-enhancement residuals and convolutional long short-term memory networks.
Xuan Dong ... Donald S Williamson
The Journal of the Acoustical Society of America | VOL. 148
Xuan Dong, et. al.Xuan Dong ... Donald S Williamson
01 Nov 2020
The Journal of the Acoustical Society of America | VOL. 148

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing