Abstract

Multimodal Emotion Recognition for Conversation (ERC) is a challenging multi-class classification task that requires recognizing multiple speakers’ emotions in text, audio, video, and other modalities. ERC has received considerable attention from researchers due to its potential applications in opinion mining, advertising, and healthcare. However, the syntactic structure characteristics of the text itself have not been considered in this study. Taking into account this, this paper proposes a conversational affective analysis model (DSAGCN) combining dependent syntactic analysis and graph convolutional neural networks. Since words that reflect emotional polarity are usually concentrated exclusively in limited regions, the DSAGCN model first employs a self-attention mechanism to capture the most effective words in the dialogue context and obtain a more accurate vector representation of the emotional semantics. Then, based on speaker relationships and dependent syntactic relationships, the multimodal sentiment relationship graphs are constructed. Finally, a graph convolutional neural network is used to complete the recognition of multimodal emotion. In extensive experiments on two real datasets, IEMOCAP and MELD, the DSAGCN model outperforms the existing models in terms of average accuracy and f1 values for multimodal emotion recognition, especially for emotions such as “happiness” and “anger”. Thus, dependent syntactic analysis and self-attention mechanism can enhance the model’s ability to understand emotions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call