SMIN: Semi-Supervised Multi-Modal Interaction Network for Conversational Emotion Recognition

Zheng Lian,Bin Liu,Jianhua Tao

doi:10.1109/taffc.2022.3141237

Abstract

Conversational emotion recognition is a crucial research topic in human-computer interactions. Due to the heavy annotation cost and inevitable label ambiguity, collecting large amounts of labeled data is challenging and expensive, which restricts the performance of current fully-supervised methods in this domain. To address this problem, researchers attempt to distill knowledge from unlabeled data via semi-supervised learning. However, most of these semi-supervised methods ignore multimodal interactive information, although recent works have proven that such interactive information is essential for emotion recognition. To this end, we propose a novel framework to seamlessly integrate semi-supervised learning with multimodal interactions, called ‘`Semi-supervised Multi-modal Interaction Network (SMIN)’'. SMIN contains two well-designed semi-supervised modules, ‘`Intra-modal Interactive Module (IIM)’' and ‘`Cross-modal Interactive Module (CIM)’' to learn intra- and cross-modal interactions. These two modules leverage additional unlabeled data to extract emotion-salient representations. To capture additional contextual information, we utilize the hierarchical recurrent networks followed with the hybrid fusion strategy to integrate multimodal features. These multimodal features are further utilized for conversational emotion recognition. Experimental results on four benchmark datasets (i.e., IEMOCAP, MELD, CMU-MOSI and CMU-MOSEI) demonstrate that SMIN succeeds over existing state-of-the-art strategies on emotion recognition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SMIN: Semi-Supervised Multi-Modal Interaction Network for Conversational Emotion Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Affective Computing

Lead the way for us

Journal: IEEE Transactions on Affective Computing	Publication Date: Jul 1, 2023
Citations: 21

Similar Papers

Analyzing audiovisual data for understanding user's emotion in human−computer interaction environment
Juan Yang ... Xu Du
Data Technologies and Applications | VOL. 58
Juan Yang, et. al.Juan Yang ... Xu Du
01 Nov 2023
Data Technologies and Applications | VOL. 58

Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition.
Xiaodong Liu ... Songyang Li
Computational Intelligence and Neuroscience | VOL. 2021
Xiaodong Liu, et. al.Xiaodong Liu ... Songyang Li
01 Jan 2020
Computational Intelligence and Neuroscience | VOL. 2021

Multimodal Knowledge-enhanced Interactive Network with Mixed Contrastive Learning for Emotion Recognition in Conversation
Xudong Shen ... Xinyi Gan
Neurocomputing | VOL. 582
Xudong Shen, et. al.Xudong Shen ... Xinyi Gan
16 Mar 2024
Neurocomputing | VOL. 582

Multiscale-multichannel feature extraction and classification through one-dimensional convolutional neural network for Speech emotion recognition
Minying Liu ... Shuxin Zhuang
Speech Communication | VOL. 156
Minying Liu, et. al.Minying Liu ... Shuxin Zhuang
22 Nov 2023
Speech Communication | VOL. 156

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SMIN: Semi-Supervised Multi-Modal Interaction Network for Conversational Emotion Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Affective Computing