Effective Attention Mechanism in Dynamic Models for Speech Emotion Recognition

Po-Wei Hsiao,Chia-Ping Chen

doi:10.1109/icassp.2018.8461431

Abstract

We propose to integrate the attention mechanism into deep recurrent neural network models for speech emotion recognition. This is based on the intuition that it is beneficial to emphasize the expressive part of the speech signal for emotion recognition. By introducing attention mechanism, we make the system learn how to focus on the more robust or informative segments in the input signal. The proposed recognition model is evaluated on the FAU-Aibo tasks as defined in Interspeech 2009 Emotion Challenge. Our baseline deep recurrent neural network model achieves 37.0% unweighted averaged (UA) recall rate, which is on par with the official HMM baseline system for dynamic modeling framework. The proposed integration of attention mechanism on top of the baseline deep RNN model achieves 46.3% UA recall rate. As far as we know, this is the best UA recall rate ever achieved on FAU-Aibo tasks within the dynamic modeling framework.

Full Text