Abstract

Speech enhancement in noisy and reverberant conditions is a crucial and challenging task in many scenarios. In this paper, a multi-objective based multi-channel speech enhancement method that uses the bidirectional long short-term memory (BiLSTM) network is proposed for dealing with the noise and reverberation. First, for each channel of microphone array, the log-power spectra (LPS) of noisy speech are provided as the input of BiLSTM network to predict the LPS and ideal ratio mask (IRM) of clean speech. Then, a fusion layer is employed to combine the intermediate LPS and IRM features obtained from all channels into a single-channel LPS. Finally, the deep neural network (DNN) is incorporated to further learn the relationship between the fused single-channel LPS and the LPS of clean speech. Compared with the single-channel and single objective methods, the proposed method achieves significant improvements in speech enhancement and has good robustness against the noises and reverberations. Experimental results reveal the validity and adaptability of the proposed speech enhancement method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.