A high speed inference architecture for multimodal emotion recognition based on sparse cross modal encoder

Lin Cui,Yuanbang Zhang,Yingkai Cui,Boyan Wang,Xiaodong Sun

doi:10.1016/j.jksuci.2024.102092

Abstract

In recent years, multimodal emotion recognition models are using pre-trained networks and attention mechanisms to pursue higher accuracy, which increases the training burden and slows down the training and inference speed. In order to strike a balance between speed and accuracy, this paper proposes a speed-optimized multimodal emotion recognition architecture for speech and text emotion recognition. In the feature extraction part, a lightweight residual graph convolutional network (ResGCN) is selected as the speech feature extractor, and an efficient RoBERTa pre-trained network is used as the text feature extractor. Then, an algorithm complexity-optimized sparse cross-modal encoder (SCME) is proposed and used to fuse these two types of features. Finally, a new gated fusion module (GF) is used to weight multiple results and input them into a fully connected layer (FC) for classification. The proposed method is tested on the IEMOCAP dataset and the MELD dataset, achieving weighted accuracies (WA) of 82.4% and 65.0%, respectively. This method achieves higher accuracy than the listed methods while having an acceptable training and inference speed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A high speed inference architecture for multimodal emotion recognition based on sparse cross modal encoder

Abstract

Talk to us

Similar Papers

More From: Journal of King Saud University - Computer and Information Sciences

Lead the way for us

Journal: Journal of King Saud University - Computer and Information Sciences	Publication Date: Jun 1, 2024
License type: cc-by-nc-nd

Similar Papers

Research on Multi-modal Mandarin Speech Emotion Recognition Based on SVM
Chen Caihua
-
Chen CaihuaChen Caihua
01 Jul 2019
01 Jul 2019

A multimodal fusion emotion recognition method based on multitask learning and attention mechanism
Jinbao Xie ... Yury I Varatnitski
Neurocomputing | VOL. 556
Jinbao Xie, et. al.Jinbao Xie ... Yury I Varatnitski
04 Aug 2023
Neurocomputing | VOL. 556

Multi-modal Emotion Recognition Based on Deep Learning in Speech, Video and Text
Xue Zhang ... Xing-Da Guo
-
Xue Zhang, et. al.Xue Zhang ... Xing-Da Guo
23 Oct 2020
23 Oct 2020

Multimodal Emotion Recognition Based on Cascaded Multichannel and Hierarchical Fusion.
Xia Liu ... Zhijing Xu
Computational Intelligence and Neuroscience | VOL. 2023
Xia Liu, et. al.Xia Liu ... Zhijing Xu
01 Jan 2023
Computational Intelligence and Neuroscience | VOL. 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A high speed inference architecture for multimodal emotion recognition based on sparse cross modal encoder

Abstract

Talk to us

Similar Papers

More From: Journal of King Saud University - Computer and Information Sciences