Integrating Articulatory based Features with Auditory Based Features for Robust Stressed Speech Recognition

Tin Lay Nwe Tin Lay Nwe,Ye Wang Ye Wang,Haizhou Li Haizhou Li

doi:10.1109/icics.2005.1689273

Abstract

Intra-speaker variations due to perceptually induced stress or emotion adversely affect speech recognition system performance. In this paper, we combine auditory based (Mel frequency cepstral coefficients and linear predictive cepstral coefficients) features and articulatory based (voicedness) features for robust speech recognition. Voicedness features are derived using linear and teager energy operator (TEO) based nonlinear fast Fourier transform (FFT) spectra. Nonlinear properties are analyzed in both the time and frequency domains. In addition, we investigate the sensitivity of all these FFT spectra to stress and observe the performance of individual FFT spectra. The system is tested using stressed speech data from the speech under simulated and actual stress (SUSAS) database. The results show that articulatory based features help to improve the system performance. Furthermore, significant performance improvement has been observed when using the FFT spectrum which is less sensitive to stress

Full Text