DialogueTRM: Exploring Multi-Modal Emotional Dynamics in a Conversation

Yuzhao Mao,Weiguo Gao,Guang Liu,Xuan Li,Xiaojie Wang

doi:10.18653/v1/2021.findings-emnlp.229

Abstract

Emotion dynamics formulates principles explaining the emotional fluctuation during conversations. Recent studies explore the emotion dynamics from the self and inter-personal dependencies, however, ignoring the temporal and spatial dependencies in the situation of multi-modal conversations. To address the issue, we extend the concept of emotion dynamics to multi-modal settings and propose a Dialogue Transformer for simultaneously modeling the intra-modal and inter-modal emotion dynamics. Specifically, the intra-modal emotion dynamics is to not only capture the temporal dependency but also satisfy the context preference in every single modality. The inter-modal emotional dynamics aims at handling multi-grained spatial dependency across all modalities. Our models outperform the state-of-the-art with a margin of 4%-16% for most of the metrics on three benchmark datasets.

Highlights

We extend the concept of emotion dynamics to multi-modal settings, which takes account of the intra-modal and intermodal emotion dynamics, or multi-modal emotion dynamics for short
The intermodal emotion dynamics is another emotional influence that one modality received from the other modalities at each conversation turn
To overcome dencies as it is done in vanilla emotion dynamics

Summary

Related Work

For inter-modal emotion dynamics, the spatial dependency can be captured by interactive weighting across multi-modal features. The modeling of inter-modal emotion dynamics should consider the two granularities of dependencies. For intra-modal emotion dynamics, we facilitate Transformers for temporal modeling that satisfies the context preferences of different modalities. For inter-modal emotion dynamics, we design a Multi-Grained Interactive Fusion (MGIF) to deal with the prototype and representation dependencies across modalities. Several pieces of work, e.g., transfer learning ERC (Hazarika et al, 2019), and commonsense knowledge ERC (Ghosal et al, 2020), have employed pre-training models to the task of ERC Those approaches ignore the multimodal emotion dynamics in conversations. Task- only capture the temporal dependency and oriented fusion Machines (Srivastava and Salakhutdinov, 2012) or tion dynamics is depicted on the left of Figure 2

Context-Dependent Modeling

Prototype Dependency

Experiment

Implementation Details

Case Study