M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation

Vishal Chudasama,Pankaj Wasnik,Nirmesh Shah,Purbayan Kar,Ashish Gudmalwar,Naoyuki Onoe

doi:10.1109/cvprw56347.2022.00511

Abstract

Emotion Recognition in Conversations (ERC) is crucial in developing sympathetic human-machine interaction. In conversational videos, emotion can be present in multiple modalities, i.e., audio, video, and transcript. However, due to the inherent characteristics of these modalities, multi-modal ERC has always been considered a challenging undertaking. Existing ERC research focuses mainly on using text information in a discussion, ignoring the other two modalities. We anticipate that emotion recognition accuracy can be improved by employing a multi-modal approach. Thus, in this study, we propose a Multi-modal Fusion Network (M2FNet) that extracts emotion-relevant features from visual, audio, and text modality. It employs a multi-head attention-based fusion mechanism to combine emotion-rich latent representations of the input data. We introduce a new feature extractor to extract latent features from the audio and visual modality. The proposed feature extractor is trained with a novel adaptive margin-based triplet loss function to learn emotion-relevant features from the audio and visual data. In the domain of ERC, the existing methods perform well on one benchmark dataset but not on others. Our results show that the proposed M2FNet architecture outperforms all other methods in terms of weighted average F1 score on well-known MELD and IEMOCAP datasets and sets a new state-of-the-art performance in ERC.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Deep emotion recognition in textual conversations: a survey
Patrícia Pereira ... Joao Paulo Carvalho
Artificial Intelligence Review | VOL. 58
Patrícia Pereira, et. al.Patrícia Pereira ... Joao Paulo Carvalho
07 Nov 2024
Artificial Intelligence Review | VOL. 58

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations
Soujanya Poria ... Gautam Naik
-
Soujanya Poria, et. al.Soujanya Poria ... Gautam Naik
01 Jan 2019
01 Jan 2019

Multimodal Knowledge-enhanced Interactive Network with Mixed Contrastive Learning for Emotion Recognition in Conversation
Xudong Shen ... Xinyi Gan
Neurocomputing | VOL. 582
Xudong Shen, et. al.Xudong Shen ... Xinyi Gan
16 Mar 2024
Neurocomputing | VOL. 582

MRSLN: A Multimodal Residual Speaker-LSTM Network to alleviate the over-smoothing issue for Emotion Recognition in Conversation
Nannan Lu ... Jiansheng Qian
Neurocomputing | VOL. 580
Nannan Lu, et. al.Nannan Lu ... Jiansheng Qian
06 Mar 2024
Neurocomputing | VOL. 580

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation

Abstract

Talk to us

Similar Papers