Single-channel speech enhancement with correlated spectral components: Limits-potential

Pejman Mowlaee,Johannes K.W Stahl

doi:10.1016/j.specom.2020.05.002

Pejman Mowlaee, Johannes K.W Stahl

https://doi.org/10.1016/j.specom.2020.05.002

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

In this paper, we investigate single-channel speech enhancement algorithms that operate in the short-time Fourier transform and take into account dependencies w.r.t. frequency. As a result of allowing for inter-frequency dependencies, the minimum mean square error optimal estimates of the short-time Fourier transform expansion coefficients are functions of complex-valued covariance matrices in general. The covariance matrices are not known a priori and have to be estimated from the observed data. This work is dedicated to analyzing how this affects the respective single-channel speech enhancement algorithms. We propose a statistical model that circumvents the need to estimate complex-valued second order statistics and derive a linear multidimensional short-time spectral amplitude estimator that is motivated by these assumptions. Further, we provide empirical evidence for the assumptions that form the basis of this model. We evaluate the potential of taking into account inter-frequency dependencies for single-channel speech enhancement and subsequently compare the estimator resulting from the proposed statistical model to relevant benchmark methods. The results indicate that estimators that consider inter-frequency dependencies are capable of pushing the limits of standard approaches in terms of joint speech quality and intelligibility improvement when the second order statistics are estimated from isolated speech data. The proposed linear multidimensional short-time spectral amplitude estimator preserves this trend in fully blind scenarios.

Full Text