An Analysis of Traditional Noise Power Spectral Density Estimators Based on the Gaussian Stochastic Volatility Model

Jesper Kjær Nielsen,Mads Græsbøll Christensen,Jesper Bünsow Boldt

doi:10.1109/taslp.2023.3282107

Abstract

Many single- and multi-channel speech enhancement techniques, old and new, rely in one way or another on estimates of the noise power spectral density (PSD). For example, the classical Wiener filter requires that either the speech or noise PSD be estimated. Typically, the noise PSD is estimated, as it is often easier to model and estimate than the speech. As a result, much attention has been paid to this important problem over the past couple of decades, with important scientific milestones being the minimum statistics (MS), the minima controlled recursive averaging (IMCRA), and the minimum mean squared (MMSE) estimators. Despite leading to major progress, these estimators are rather ad hoc, making them difficult to tune and improve in a systematic manner. In this article, we analyse some of the common heuristics employed in such noise PSD estimators to put them on firmer mathematical ground. More specifically, we use the Gaussian stochastic volatility model and show that the MMSE noise PSD estimator can be interpreted as a special case thereof. Moreover, we analyze the related problem of speech presence probability (SPP) estimation and show that the SPP estimation performed in the MMSE noise PSD estimator can be interpreted as an SNR estimator in the context of the Gaussian stochastic volatility model.

Full Text