Speech Enhancement Using Convolutional Recurrent Neural Network with Twin Gate Units and Two-Stage Modeling

Baosheng Lv,Hongqing Liu,Yongbao Ma,Yi Zhou

doi:10.1109/iccc56324.2022.10065923

Baosheng Lv, Hongqing Liu + Show 2 more

https://doi.org/10.1109/iccc56324.2022.10065923

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

In this paper, a neural network is proposed based on a convolutional recurrent network (CRN) architecture for speech enhancement in the time-frequency (T-F) domain. The proposed model is composed of a convolutional encoder-decoder (CED) with gated convolutional units, a skip connection module named attention gate units (AGUs) and a two-stage modeling module, which is called a grouped F-T-LSTM using multiply long short-term memory (LSTM) layers. The encoder transforms the input noisy speech into high dimensional feature maps and the gated convolutional units control the information flows through the network. Instead of directly concatenating the output of each encoder layer to the input of the corresponding decoder layer, attention gate units that better integrate both information are developed as the skip connections. The grouped F-T-LSTM module is developed to model the relations between frequency bands and temporal dependencies. Finally, the decoder uses the output of AGUs and the grouped F-T-LSTM to produce a mask. The proposed model is trained using a loss function which is a weighted time domain and time-frequency loss. Experimental results show that the proposed model achieves a superior performance when compared with other existing speech enhancement methods.

Full Text