Abstract

In recent studies on Monaural Source Separation (MSS), the long short-term memory (LSTM) network has been introduced to solve this problem, however, its performance is still limited particularly in real room environments. According to the training objectives, the LSTM-based MSS is categorized into three aspects, namely mapping, masking and signal approximation (SA) based methods. In this paper, we introduce dereverberation mask (DM) and establish a system to train two SA-LSTMs sequentially, which dereverberate speech mixture and improve the separation performance. The DM is exploited as the training target of the first LSTM. Then, the enhanced ratio mask (ERM) is proposed and set as the training target of the second LSTM. We evaluate the proposed method with the IEEE and the TIMIT datasets with real room impulse responses and noise interferences from the NOISEX dataset. The detailed evaluations confirm that the proposed method outperforms the state-of-the-art.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.