Abstract

In this paper, a set of features derived by filtering and spectral peak extraction in autocorrelation domain are proposed. We focus on the effect of the additive noise on speech recognition. Assuming that the channel characteristics and additive noises are stationary, these new features improve the robustness of speech recognition in noisy conditions. In this approach, initially, the autocorrelation sequence of a speech signal frame is computed. Filtering of the autocorrelation of speech signal is carried out in the second step, and then, the short-time power spectrum of speech is obtained from the speech signal through the fast Fourier transform. The power spectrum peaks are then calculated by differentiating the power spectrum with respect to frequency. The magnitudes of these peaks are then projected onto the mel-scale and pass the filter bank. Finally, a set of cepstral coefficients are derived from the outputs of the filter bank. The effectiveness of the new features for speech recognition in noisy conditions will be shown in this paper through a number of speech recognition experiments. A task of multi-speaker isolated-word recognition and another one of multi-speaker continuous speech recognition with various artificially added noises such as factory, babble, car and F16 were used in these experiments. Also, a set of experiments were carried out on Aurora 2 task. Experimental results show significant improvements under noisy conditions in comparison to the results obtained using traditional feature extraction methods. We have also reported the results obtained by applying cepstral mean normalization on the methods to get robust features against both additive noise and channel distortion.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call