Abstract

Emotion recognition in conversations (ERC) has been challenging due to the dynamics and complexity of emotions in conversations. Most current emotion recognition studies have focused on modeling temporal dimensions, such as context-sensitive dependencies while ignoring the spatial dimensional relationships between discourses. In this paper, we propose a dual-channel information fusion network based on a voting mechanism, in which we model modal information in two dimensions: distance dependence is constructed in the spatial dimension, and context dependence is constructed in the temporal dimension. We aim to extract more comprehensive and accurate sentiment features to recognize verbal emotions from limited data. In addition, we discard the traditional modal fusion method and propose a fusion strategy based on the voting mechanism, which significantly accelerates the model convergence speed and recognition performance. We conducted experiments on two benchmark datasets, IEMOCAP and MELD, respectively, and the experimental results demonstrate the superiority of the designed network in emotion recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call