Abstract

This study proposes a bi-directional recurrent neural network (Bi-RNN) post-processing method for speech enhancement (SE) at low signal-to noise ratios (SNR). Current speech enhancement solutions performed badly under low SNR situations. Loizou and Kim proposed a solution to reduce speech distortion errors in time-frequency (T-F) domain but it requires the knowledge of ground truth. As ground truth is unknown in real-life applications, the current study proposes to use a Bi-RNN to implement Loizou and Kim’s solution as a post-processing method for SE engines. Our solutions do not require prior knowledge of ground truth. The effectiveness of the proposed method is investigated with a spectral subtraction (SS) SE engine, a non-negative matrix factorization (NMF) SE engine, and a deep neural network ideal ratio mask (DNN-IRM) SE engine, under matched/mis-matched noise and different SNR conditions. Experimental results demonstrate that the proposed post-processing method effectively improved both perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) for all of these SE engines, especially at low SNR conditions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.