Dynamic interactive multiview memory network for emotion recognition in conversation

Jintao Wen,Dazhi Jiang,Geng Tu,Cheng Liu,Erik Cambria

doi:10.1016/j.inffus.2022.10.009

Jintao Wen, Dazhi Jiang + Show 3 more

Open Access

https://doi.org/10.1016/j.inffus.2022.10.009

Copy DOI

Journal: Information Fusion	Publication Date: Oct 14, 2022
Citations: 24	License type: publisher-specific-oa

Affiliation: Shantou University, Nanyang Technological University

Abstract

When available, multimodal data is key for enhanced emotion recognition in conversation. Text, audio, and video in dialogues can facilitate and complement each other in analyzing speakers’ emotions. However, it is very challenging to effectively fuse multimodal features to understand the detailed contextual information in conversations. In this work, we focus on dynamic interactions during the information fusion process and propose a Dynamic Interactive Multiview Memory Network (DIMMN) model to integrate interaction information for recognizing emotions. Specifically, the information fusion within DIMMN is through multiple perspectives (combining different modalities). We designed multiview layers in attention networks to enable the model to mine the crossmodal dynamic dependencies between different groups in the process of dynamic modal interaction. In order to learn the long-term dependency information, temporal convolutional networks are introduced to synthesize contextual information of a single person. Then, the gated recurrent units and memory networks are used to model the global session to detect contextual dependencies for multi-round, multi-speaker interactive emotion information. Experimental results on IEMOCAP and MELD demonstrate that DIMMN achieves better and comparable performance to the state-of-the-art methods, with an accuracy of 64.7% and 60.6%, respectively.

Full Text