Towards Learning a Joint Representation from Transformer in Multimodal Emotion Recognition

James J Deng,Clement H C Leung

doi:10.1007/978-3-030-86993-9_17

Abstract

Emotion recognition has been extensively studied in a single modality in the last decade. However, humans express their emotions usually through multiple modalities like voice, facial expressions, or text. This paper proposes a new method to learn a joint emotion representation for multimodal emotion recognition. Emotion-based feature for speech audio is learned by an unsupervised triplet-loss objective, and a text-to-text transformer network is used to extract text embedding for latent emotional meaning. Transfer learning provides a powerful and reusable technique to help fine-tune emotion recognition models trained on mega audio and text datasets respectively. The extracted emotional information from speech audio and text embedding are processed by dedicated transformer networks. The alternating co-attention mechanism is used to construct a deep transformer network. Multimodal fusion is implemented by a deep co-attention transformer network. Experimental results show the proposed method for learning a joint emotion representation achieves good performance in multimodal emotion recognition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Towards Learning a Joint Representation from Transformer in Multimodal Emotion Recognition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Multimodal Emotion Recognition Using Deep Generalized Canonical Correlation Analysis with an Attention Mechanism
Yu-Ting Lan ... Bao-Liang Lu
-
Yu-Ting Lan, et. al.Yu-Ting Lan ... Bao-Liang Lu
01 Jul 2020
01 Jul 2020

A multimodal fusion emotion recognition method based on multitask learning and attention mechanism
Jinbao Xie ... Yury I Varatnitski
Neurocomputing | VOL. 556
Jinbao Xie, et. al.Jinbao Xie ... Yury I Varatnitski
04 Aug 2023
Neurocomputing | VOL. 556

Using transformers for multimodal emotion recognition: Taxonomies and state of the art review
Samira Hazmoune ... Fateh Bougamouza
Engineering Applications of Artificial Intelligence | VOL. 133
Samira Hazmoune, et. al.Samira Hazmoune ... Fateh Bougamouza
02 Apr 2024
Engineering Applications of Artificial Intelligence | VOL. 133

Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning.
Dong Liu ... Zhiyong Wang
Frontiers in neurorobotics | VOL. 15
Dong Liu, et. al.Dong Liu ... Zhiyong Wang
09 Jul 2021
Frontiers in neurorobotics | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards Learning a Joint Representation from Transformer in Multimodal Emotion Recognition

Abstract

Talk to us

Similar Papers