Sub-band SNR estimation using auditory feature processing

Michael Kleinschmidt,Volker Hohmann

doi:10.1016/s0167-6393(02)00058-4

Michael Kleinschmidt, Volker Hohmann

Open Access

https://doi.org/10.1016/s0167-6393(02)00058-4

Copy DOI

Abstract

In this paper a new approach is presented for estimating the long-term speech-to-noise ratio (SNR) in individual frequency bands that is based on methods known from automatic speech recognition (ASR). It uses a model of auditory perception as front end, physiologically and psychoacoustically motivated sigma–pi cells as secondary features, and a linear or non-linear neural network as classifier. A non-linear neural network back end is capable of estimating the SNR in time segments of 1 s with a root-mean-square error of 5.68 dB on unknown test material. This performance is obtained on a large set of natural types of noise, containing instationary signals and alarm sounds. However, the SNR estimation works best for more stationary types of noise. The individual components of the estimation algorithms are examined with respect to their importance for the estimation accuracy. The algorithm presented in this paper yields similar or better results with comparable computational effort relative to other methods known from the literature for short-term SNR estimation. The new approach is purely based on slow spectro-temporal modulations and is therefore a valuable contribution to both, digital hearing-aids and ASR systems.

Full Text