A Dual-Channel End-to-End Speech Enhancement Method Using Complex Operations in the Time Domain

Jian Pang,Hongqing Liu,Xiangning Liao,Hui Wang,Hongcheng Li,Tao Jiang,Le Luo

doi:10.3390/app13137698

Abstract

This study investigates the utilization of complex operations to perform multichannel speech enhancement in the time domain using a neural network. Previous studies have demonstrated the advantages of incorporating complex operations when designing neural networks; however, they have solely focused on frequency-domain enhancement techniques. In contrast, our research study presents an end-to-end approach to perform speech enhancement in the time domain. We used the Hilbert transform to intelligently generate complex time-domain waveforms as inputs to the network. This allowed us to create an end-to-end approach that explores spatial information. To handle the complexity of the inputs, we developed a complex neural adaptive beamformer (CNAB). We utilized complex shared long short-term memory (LSTM), split-LSTM, and complex convolutions to generate the beamforming output. Following this, we developed a complex full convolutional network (CFCN) to enhance the beamforming output. We leveraged complex dilated convolutions to model the long-term temporal dependencies of speech. By cascading the CNAB and CFCN, we created the final end-to-end time-domain enhancement network, named CNABCFCN. We trained and tested CNABCFCN using the deep noise suppression (DNS) challenge dataset. Our results demonstrate the advantages of using complex operations over the baseline model. Furthermore, the proposed CNABCFCN performed better in terms of both objective and subjective measures compared with other networks.

Full Text