Abstract

In many acoustic conditions, a single-channel recorded speech signal may be severely affected by reverberation and noise, leading to a reduced speech quality and intelligibility. This paper focuses on proposing a novel two-stage model scheme by decomposing room impulse responses (RIRs) into two convolution parts for single-channel speech dereverberation and denoising. Similar as previous methods, the proposed two-stage model uses non-negative approximations of the convolutive transfer function (NCTF) to simultaneously estimate the magnitude spectrograms of the speech and the RIR. It focuses on iteratively updating model parameters to estimate a less reverberant speech signal and a short RIR at first stage, then the clean speech signal and the other short RIR are estimated by iteratively renewing at the second stage. There are always denosing processing steps existing in both stages to denoise more thoroughly. A straightforward method based on the scheme is built to enhance the speech from the noisy reverberant signal, then two fusion methods inspired by ensemble learning are proposed for speech enhancement. The advantages of our proposed methods are more capable to enhance the speech and more time-saving through decomposing the long RIRs into two shorter ones. Additionally, the optimal estimator is derived based on temporal stacking to utilize speech temporal dynamics. Experiments are performed on two simulated RIRs and a real RIR to compare the performances of the proposed methods with a state-of-the-art method and the results show that the proposed methods have achieved either better or comparable performances in most measures but phone error rate.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.