Abstract

Exemplar-based nonnegative models, where the noisy speech is decomposed as a sparse nonnegative linear combination of the speech and noise exemplars stored in a dictionary, have been successfully used for speech denoising. This paper extends this technique for the single-channel speech enhancement in noisy reverberant environments using a novel approximation of the noisy reverberant speech in the frequency domain and nonnegative matrix deconvolution. In the proposed model, the room impulse response RIR in the magnitude short-time Fourier transform domain is defined such that its decaying structure can also be estimated from the test data itself, whereas the existing models used a suboptimal binwise clamping procedure to impose such a decaying structure that does not hold in a typical RIR. This paper presents multiplicative updates for estimating the RIR, its decay, and the underlying anechoic speech and noise. The proposed model is evaluated on a synthetically created dataset created by convolving TIMIT recordings with RIRs measured from different rooms and varying speaker-and-microphone locations, and adding background noises taken from the CHiME corpus. Simulation results show that the proposed model results in a better RIR estimate over the existing model and improves various instrumental speech quality measures.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.