Abstract

Many single- and multi-channel speech enhancement techniques, old and new, rely in one way or another on estimates of the noise power spectral density (PSD). For example, the classical Wiener filter requires that either the speech or noise PSD be estimated. Typically, the noise PSD is estimated, as it is often easier to model and estimate than the speech. As a result, much attention has been paid to this important problem over the past couple of decades, with important scientific milestones being the minimum statistics (MS), the minima controlled recursive averaging (IMCRA), and the minimum mean squared (MMSE) estimators. Despite leading to major progress, these estimators are rather ad hoc, making them difficult to tune and improve in a systematic manner. In this article, we analyse some of the common heuristics employed in such noise PSD estimators to put them on firmer mathematical ground. More specifically, we use the Gaussian stochastic volatility model and show that the MMSE noise PSD estimator can be interpreted as a special case thereof. Moreover, we analyze the related problem of speech presence probability (SPP) estimation and show that the SPP estimation performed in the MMSE noise PSD estimator can be interpreted as an SNR estimator in the context of the Gaussian stochastic volatility model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.