Abstract

Multi-modal video emotion reasoning (MERV) has recently attracted increasing attention due to its potential application in human-computer interaction. This task needs to not only recognize utterance-level emotions for conspicuous speakers, but also perceive the emotions of non-speakers in videos. Existing methods focus on modeling multi-modal multi-level contexts to capture emotion-relevant clues from the complex scenarios in videos. However, the context information is far from enough to infer the emotion labels of non-speakers due to the large gap between the scenario situation and emotions labels. Inspired by the observation that humans can find solutions to complex problems with the leverage of experience and knowledge, we propose SK-MER , a Scenario-relevant Knowledge-enhanced Multi-modal Emotion Reasoning framework for MERV task, which can leverage external knowledge to enhance the video scenario understanding and emotion reasoning. Specifically, we use scenario concepts extracted from videos to build knowledge subgraphs from external knowledge bases. The knowledge subgraphs are then utilized to obtain scenario-relevant knowledge representations through dynamic knowledge graph attention. Next, we incorporate the knowledge representations into context modeling to enhance emotion reasoning with external scenario-relevant knowledge. In addition, we propose a counterfactual knowledge representation learning approach to obtain more effective scenario-relevant knowledge representations. Extensive experimental results on MEmoR dataset show that the proposed SK-MER framework achieves new state-of-the-art results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call