The a priori signal-to-noise ratio (SNR) plays an important role in many speech enhancement algorithms. In this paper, we present a data-driven approach to a priori SNR estimation. It may be used with a wide range of speech enhancement techniques, such as, e.g., the minimum mean square error (MMSE) (log) spectral amplitude estimator, the super Gaussian joint maximum a posteriori (JMAP) estimator, or the Wiener filter. The proposed SNR estimator employs two trained artificial neural networks, one for speech presence, one for speech absence. The classical decision-directed a priori SNR estimator by Ephraim and Malah is broken down into its two additive components, which now represent the two input signals to the neural networks. Both output nodes are combined to represent the new a priori SNR estimate. As an alternative to the neural networks, also simple lookup tables are investigated. Employment of these data-driven nonlinear a priori SNR estimators reduces speech distortion, particularly in speech onset, while retaining a high level of noise attenuation in speech absence.
Read full abstract