The subspace approach as a first stage in speech enhancement

E Vera

doi:10.1109/tla.2011.6030981

Abstract

To increase the performance of the communication systems and other applications related to speech or speaker recognition, it is usual to incorporate a preprocessing stage for speech enhancement before it enters a specific system. Although the signal subspace approach is very useful, it becomes less effective when the signal to noise ratio is lower than 10 dB. This is due to possible inaccuracies in estimating the noise, whose action on the different phonemes and frequency bands is far from uniform, even considering white noise. A critical point is estimating the signal subspace dimension. It not only depends on the variance the noise level but also on SNR, too, which varies according to the different temporal segments of speech as well as the frequency band. Then it is proposed to work on temporal frames in each critical band using as the reference the noise variance affected by a factor which depends on the particular SNR segment being considered to estimate the signal dimension Thus weak signal components are preserved which could be eliminated otherwise. The noisy signal projection on the signal subspace can be regarded as a first stage in the speech enhancement process that has to be complement with additional techniques.

Full Text