Non-Stationary Noise PSD Matrix Estimation for Multichannel Blind Speech Extraction

Maja Taseska,Emanuel A P Habets

doi:10.1109/taslp.2017.2750239

Maja Taseska, Emanuel A P Habets

https://doi.org/10.1109/taslp.2017.2750239

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Noise power spectral density (PSD) matrix estimation is one of the most important components of a multichannel blind speech extraction framework, as it largely determines the amount of residual noise at the output of a spatial filter. Optimality of well-known spatial filters, such as the multichannel Wiener filter, is only ensured if the PSD matrices of the noise and the desired speech are accurately estimated. In practical situations, where the noise is nonstationary, temporal averaging over time frames where the desired signal is inactive does not provide sufficiently fast tracking of the noise PSD matrix, resulting in high residual noise at the spatial filter output. Therefore, approaches that estimate the PSD matrices using narrowband signal detection have been proposed. Following the well-known single- and multichannel minima-controlled recursive averaging (MCRA) approaches, in this paper, we focus on narrowband speech presence probability-based noise PSD matrix estimators, which are suitable for blind scenarios where the location and the propagation vector of the desired speech source are unknown. The main contributions of the paper are a maximum likelihood interpretation of the multichannel MCRA, and a coherent-to-diffuse ratio-based a priori speech absence probability (SAP) estimator. The latter is a key parameter that determines the accuracy of the noise PSD matrix estimates in nonstationary scenarios. In this paper, we confirm the importance of the a priori SAP and show that its control is crucial for source extraction in nonstationary environments.

Full Text