Abstract

The current study is focused on automatic speech emotion recognition, and particularly on the effect of additive noise and reverberation on speech emotion recognition. The emotional clean speech is produced by four professional actors, who simulate the neutral, joy, anger, and sadness emotions. To produce noisy emotional speech data, white Gaussian noise is superimposed onto the clean speech at several signal-to-noise ratio (SNR) levels. Concerning the reverberant emotional speech data, a technique is applied which is based on convolution of clean speech data with impulse responses recorded in several environments with different reverberation times. The four emotions are recognized using i-vectors, along with probabilistic linear discriminant analysis (PLDA), widely used in speaker recognition and adapted here for speech emotion recognition. When noisy and reverberant emotional speech data are recognized using clean models, the recognition rates are significantly decreased compared to the clean test data. To address this problem, a method based on multi-style training is applied, which utilizes training data of several SNR levels (different to the SNR level of the test data, therefore SNR-independent), or training on data with different reverberation times. Using multi-style training to recognize emotions in noisy or reverberant environments, the recognition rates are significantly increased, and the differences compared to the clean case are statistically not significant. Furthermore, the i-vector paradigm based classification method is compared with a baseline Gaussian mixture models (GMM) based method, and it demonstrates superior performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.