Abstract

In this study, we explore long short-term memory recurrent neural networks (LSTM-RNNs) for speech enhancement. First, a regression LSTM-RNN approach for a direct mapping from the noisy to clean speech features is presented and verified to be more effective than deep neural network (DNN) based regression techniques in modeling long-term acoustic context. Then, a comprehensive comparison between the proposed direct mapping based LSTM-RNN and ideal ratio mask (IRM) based LSTM-RNNs is conducted. We observe that the direct mapping framework achieves better speech intelligibility at low signal-to-noise ratios (SNRs) while the IRM approach shows its superiority at high SNRs. Accordingly, to fully utilize this complementarity, a novel multiple-target joint learning approach is designed. The experiments under unseen noises show that the proposed framework can consistently and significantly improve the objective measures for both speech quality and intelligibility.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call