Abstract

The spectrogram of speech has characteristics such as time dependence and local correlation, and classical neural network models cannot be fully adapted to the time-frequency characteristics of speech. Convolutional recurrent network (CRN) can not only capture the local patterns of the spectrogram, but also model the dependencies between consecutive frames. In this paper, we introduce gated linear unit (GLU) into the convolutional encoder-decoder (CED) structure of CRN and design a gated convolutional recurrent network (GCRN) for real-time monaural speech enhancement. We replace the long short-term memory (LSTM) in CRN with gated recurrent unit (GRU) and introduce a grouping strategy. The grouped GRU module captures the temporal dependence between consecutive frames while reducing the model complexity. Experimental results show that the GCRN model outperforms the baseline model in terms of objective speech quality and intelligibility under all test conditions. In addition, GCRN greatly reduces the number of trainable parameters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call