Noise power spectral density (PSD) matrix estimation is one of the most important components of a multichannel blind speech extraction framework, as it largely determines the amount of residual noise at the output of a spatial filter. Optimality of well-known spatial filters, such as the multichannel Wiener filter, is only ensured if the PSD matrices of the noise and the desired speech are accurately estimated. In practical situations, where the noise is nonstationary, temporal averaging over time frames where the desired signal is inactive does not provide sufficiently fast tracking of the noise PSD matrix, resulting in high residual noise at the spatial filter output. Therefore, approaches that estimate the PSD matrices using narrowband signal detection have been proposed. Following the well-known single- and multichannel minima-controlled recursive averaging (MCRA) approaches, in this paper, we focus on narrowband speech presence probability-based noise PSD matrix estimators, which are suitable for blind scenarios where the location and the propagation vector of the desired speech source are unknown. The main contributions of the paper are a maximum likelihood interpretation of the multichannel MCRA, and a coherent-to-diffuse ratio-based a priori speech absence probability (SAP) estimator. The latter is a key parameter that determines the accuracy of the noise PSD matrix estimates in nonstationary scenarios. In this paper, we confirm the importance of the a priori SAP and show that its control is crucial for source extraction in nonstationary environments.
Read full abstract