Generating Emotional Coherence and Diverse Responses in a Multimodal Dialogue System

Yunfei Huang,Zhuo Chen,Lipeng Wang,Kan Li

doi:10.1109/cecit53797.2021.00115

Abstract

The perception of emotion and the diversity of generated response are two key factors considered by researchers in multimodal dialogue generation. However, in the field of multimodal dialogue generation, these two key factors have not been considered at the same time. In our model, we first extract the features of each modal from the multimodal context dialogue, and use the heterogeneous graph neural network to represent the large graph network composed of dialogue history, voice, video, and speaker's emotional state. Then, we use conditional variational autoencoders to generate coherent and diverse responses. A large number of experiments have shown that our model can not only automatically generate reaction emotions in two multimodal datasets, but also has coherence and controllability, which is significantly better than previous more advanced models.

Full Text