Multi-objective based multi-channel speech enhancement with BiLSTM network

Xingyue Cui,Zhe Chen,Fuliang Yin

doi:10.1016/j.apacoust.2021.107927

Abstract

Speech enhancement in noisy and reverberant conditions is a crucial and challenging task in many scenarios. In this paper, a multi-objective based multi-channel speech enhancement method that uses the bidirectional long short-term memory (BiLSTM) network is proposed for dealing with the noise and reverberation. First, for each channel of microphone array, the log-power spectra (LPS) of noisy speech are provided as the input of BiLSTM network to predict the LPS and ideal ratio mask (IRM) of clean speech. Then, a fusion layer is employed to combine the intermediate LPS and IRM features obtained from all channels into a single-channel LPS. Finally, the deep neural network (DNN) is incorporated to further learn the relationship between the fused single-channel LPS and the LPS of clean speech. Compared with the single-channel and single objective methods, the proposed method achieves significant improvements in speech enhancement and has good robustness against the noises and reverberations. Experimental results reveal the validity and adaptability of the proposed speech enhancement method.

Full Text