Abstract

In many acoustic conditions, a single-channel recorded speech signal may be severely affected by reverberation and noise, leading to a reduced speech quality and intelligibility. This paper focuses on proposing a novel two-stage processing scheme for single-channel speech dereverberation and denoising to enhance the spectrum of the noisy reverberant signal. Similar as previous methods, the proposed method uses a non-negative approximation of the convolutive transfer function (N-CTF) to simultaneously estimate the magnitude spectrograms of the speech signal and the room impulse response (RIR). What's the novelty of proposed algorithm is decomposing the RIRs into two parts to build a two-stage processing scheme for enhancing speech from the noisy environments. The proposed algorithm is iteratively updated to estimate a less reverberant speech signal and a short RIR at first stage, then the clean speech signal and another short RIR are estimated by iteratively updating at the second stage. There are always denosing process steps within both stages. The advantages of our proposed algorithm are more capable to enhance the speech and more time-saving by decomposing the long RIRs into two parts. Additionally, the optimal estimator is derived based on temporal stacking to utilize speech temporal dynamics. Experiments are performed on two simulated RIRs to compare the performances of the proposed method with a state-of-the-art method and the results show that the proposed method has significantly improved the enhanced speech quality and intelligibility.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call