Abstract

Intra-speaker variations due to perceptually induced stress or emotion adversely affect speech recognition system performance. In this paper, we combine auditory based (Mel frequency cepstral coefficients and linear predictive cepstral coefficients) features and articulatory based (voicedness) features for robust speech recognition. Voicedness features are derived using linear and teager energy operator (TEO) based nonlinear fast Fourier transform (FFT) spectra. Nonlinear properties are analyzed in both the time and frequency domains. In addition, we investigate the sensitivity of all these FFT spectra to stress and observe the performance of individual FFT spectra. The system is tested using stressed speech data from the speech under simulated and actual stress (SUSAS) database. The results show that articulatory based features help to improve the system performance. Furthermore, significant performance improvement has been observed when using the FFT spectrum which is less sensitive to stress

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.