A Multitask learning model for multimodal sarcasm, sentiment and emotion recognition in conversations

Yazhou Zhang,Jinglin Wang,Yaochen Liu,Lu Rong,Qian Zheng,Dawei Song,Prayag Tiwari,Jing Qin

doi:10.1016/j.inffus.2023.01.005

Abstract

Sarcasm, sentiment and emotion are tightly coupled with each other in that one helps the understanding of another, which makes the joint recognition of sarcasm, sentiment and emotion in conversation a focus in the research in artificial intelligence (AI) and affective computing. Three main challenges exist: Context dependency, multimodal fusion and multitask interaction. However, most of the existing works fail to explicitly leverage and model the relationships among related tasks. In this paper, we aim to generically address the three problems with a multimodal joint framework. We thus propose a multimodal multitask learning model based on the encoder–decoder architecture, termed M2Seq2Seq. At the heart of the encoder module are two attention mechanisms, i.e., intramodal (Ia) attention and intermodal (Ie) attention. Ia attention is designed to capture the contextual dependency between adjacent utterances, while Ie attention is designed to model multimodal interactions. In contrast, we design two kinds of multitask learning (MTL) decoders, i.e., single-level and multilevel decoders, to explore their potential. More specifically, the core of a single-level decoder is a masked outer-modal (Or) self-attention mechanism. The main motivation of Or attention is to explicitly model the interdependence among the tasks of sarcasm, sentiment and emotion recognition. The core of the multilevel decoder contains the shared gating and task-specific gating networks. Comprehensive experiments on four bench datasets, MUStARD, Memotion, CMU-MOSEI and MELD, prove the effectiveness of M2Seq2Seq over state-of-the-art baselines (e.g., CM-GCN, A-MTL) with significant improvements of 1.9%, 2.0%, 5.0%, 0.8%, 4.3%, 3.1%, 2.8%, 1.0%, 1.7% and 2.8% in terms of Micro F1.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Multitask learning model for multimodal sarcasm, sentiment and emotion recognition in conversations

Abstract

Talk to us

Similar Papers

More From: Information Fusion

Lead the way for us

Journal: Information Fusion	Publication Date: Jan 7, 2023
Citations: 32

Similar Papers

MMATERIC: Multi-Task Learning and Multi-Fusion for AudioText Emotion Recognition in Conversation
Xingwei Liang ... Taiyu Niu
Electronics | VOL. 12
Xingwei Liang, et. al.Xingwei Liang ... Taiyu Niu
24 Mar 2023
Electronics | VOL. 12

Deep emotion recognition in textual conversations: a survey
Patrícia Pereira ... Joao Paulo Carvalho
Artificial Intelligence Review | VOL. 58
Patrícia Pereira, et. al.Patrícia Pereira ... Joao Paulo Carvalho
07 Nov 2024
Artificial Intelligence Review | VOL. 58

Emotion recognition in conversations with emotion shift detection based on multi-task learning
Qingqing Gao ... Jiuxin Cao
Knowledge-Based Systems | VOL. 248
Qingqing Gao, et. al.Qingqing Gao ... Jiuxin Cao
25 Apr 2022
Knowledge-Based Systems | VOL. 248

Domain Adversarial Network for Cross-Domain Emotion Recognition in Conversation
Hongchao Ma ... Qinglei Zhou
Applied Sciences | VOL. 12
Hongchao Ma, et. al.Hongchao Ma ... Qinglei Zhou
27 May 2022
Applied Sciences | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Multitask learning model for multimodal sarcasm, sentiment and emotion recognition in conversations

Abstract

Talk to us

Similar Papers

More From: Information Fusion