Robust speech recognition by using spectral subtraction with noise peak shifting

Peng Dai,Ing Yann Soon

doi:10.1049/iet-spr.2012.0357

Abstract

In this study, a novel technique that recovers the temporal structure of speech power spectrum is proposed. The histogram of average speech log power spectrum shows that the contamination of noise leads to the shift of noise peak, which in return degrades the performance of speech recognition systems. A two-step scheme is proposed to weaken the noise effects by first reducing the noise variance and then shifting the noise mean. The proposed algorithm consists of two parts, two-dimensional smoothing and controlled noise subtraction, which leads to the name SNS. The proposed algorithm manages to solve the speech probability distribution function discontinuity problem caused by traditional spectral subtraction series algorithms. In contrast to the clean speech estimation methods, the proposed algorithm does not need a prior speech/noise statistical model, which makes it simple but effective. The effectiveness of the proposed filter is tested using the AURORA2 database. Very promising results are obtained, 88.59% for noisy speech (average from signal-to-noise ratio 0-20 dB). Comparison is made against eight state-of-the-art speech recognition algorithms. Overall the proposed algorithm produces significant improvements over the comparison targets.

Full Text