Abstract

We propose to integrate the attention mechanism into deep recurrent neural network models for speech emotion recognition. This is based on the intuition that it is beneficial to emphasize the expressive part of the speech signal for emotion recognition. By introducing attention mechanism, we make the system learn how to focus on the more robust or informative segments in the input signal. The proposed recognition model is evaluated on the FAU-Aibo tasks as defined in Interspeech 2009 Emotion Challenge. Our baseline deep recurrent neural network model achieves 37.0% unweighted averaged (UA) recall rate, which is on par with the official HMM baseline system for dynamic modeling framework. The proposed integration of attention mechanism on top of the baseline deep RNN model achieves 46.3% UA recall rate. As far as we know, this is the best UA recall rate ever achieved on FAU-Aibo tasks within the dynamic modeling framework.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.