Abstract

This paper considers suppression of late reverberation and additive noise in single-channel speech recordings. The reverberation introduces long-term correlation in the observed signal. In the first part of this work, we show how this correlation can be used to estimate the late reverberant spectral variance (LRSV) without having to assume a specific model for the room impulse responses (RIRs) while no explicit estimates of RIR model parameters are needed. That makes this <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">correlation-based</i> approach more robust against RIR modeling errors. However, the correlation-based method can follow only slow time variations in the RIRs. Existing <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">model-based</i> methods use statistical models for the RIRs, that depend on one or more parameters that have to be estimated blindly. The common statistical models lead to simple expressions for the LRSV that depend on past values of the spectral variance of the reverberant, noise-free, signal. All existing model-based LRSV estimators in the literature are derived assuming the RIRs to be <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">time-invariant realizations</i> of a stochastic process. In the second part of this paper, we go one step further and analyze <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">time-varying</i> RIRs. We show that in this case the reverberance tends to become decorrelated. We discuss the relations between different RIR models and their corresponding LRSV estimators. We show theoretically that similar simple estimators exist as in the time-invariant case, provided that the reverberation time <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">T</i> <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">60</sub> and direct-to-reverberation ratio (DRR) of the RIRs remain nearly constant during an interval of the order of a few frames. We show that the reverberation time can be taken frequency-bin independent in DFT-based enhancement algorithms. Experiments with time-varying RIRs validate the analysis. Experiments with additive nonstationary noise and time-invariant RIRs show the influence of blind estimation of the reverberation time and the DRR.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.