Leveraging gain normalization for sub-band temporal features in noise-robust speech recognition

Hao-Teng Fan,Jeih-Weih Hung

doi:10.1109/fskd.2012.6234339

Abstract

In this paper, we propose to operate the sub-band division via discrete wavelet transform (DWT) before the process of gain normalization (GN) in producing speech features. In the presented approach, we apply the DWT to decompose the temporal-domain cepstral feature sequence, and then perform the gain normalization on each sub-band feature stream. Finally, the new feature stream is obtained by the inverse DWT of all sub-band streams. Compared with the gain normalization process directly performed on the original full-band stream, the presented approach can deal with the sub-band distortions individually and is expected to be more noise-robust. In the Aurora-2 database and task, this new sub-band GN outperforms the baseline process and the original full-band GN by 65.51% and 18.20% in relative word error reduction.

Full Text