Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture

Yu Song,Qi Zhou

doi:10.1080/08839514.2024.2356992

Abstract

ABSTRACT In the field of emotion recognition, analyzing emotions from speech alone (single-modal speech emotion recognition) has several limitations, including limited data volume and low accuracy. Additionally, single-task models lack generalization and fail to fully utilize relevant information. To address these issues, this paper proposes a new bi-modal bi-task emotion recognition model. The proposed model introduces multi-task learning on the Transformer architecture. On one hand, unsupervised contrastive predictive coding is used to extract denser features from the data while preserving self-information and context-related information. On the other hand, model robustness against interfering information is enhanced by employing self-supervised contrastive learning. Furthermore, the proposed model utilizes a modality fusion module to incorporate textual and audio information to implicitly align features from both modalities. The proposed model achieved accuracy rates of 82.3% and 83.5% on the IEMOCAP and RAVDESS datasets, respectively, when considering weighted accuracy (WA). When weight is not considered (unweighted accuracy (UA)), the model achieved 83.0% and 82.4% accuracy. Compared to the existing methods, the performance is further improved.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture

Abstract

Talk to us

Similar Papers

More From: Applied Artificial Intelligence

Lead the way for us

Journal: Applied Artificial Intelligence	Publication Date: May 21, 2024
License type: CC BY-NC 4.0

Similar Papers

Skeleton-Based Emotion Recognition Based on Two-Stream Self-Attention Enhanced Spatial-Temporal Graph Convolutional Network.
Jiaqi Shi ... Chaoran Liu
Sensors | VOL. 21
Jiaqi Shi, et. al.Jiaqi Shi ... Chaoran Liu
30 Dec 2020
Sensors | VOL. 21

Speech Emotion Recognition Using Transfer Learning: Integration of Advanced Speaker Embeddings and Image Recognition Models
Maros Jakubec ... Peter Kasak
Applied Sciences | VOL. 14
Maros Jakubec, et. al.Maros Jakubec ... Peter Kasak
31 Oct 2024
Applied Sciences | VOL. 14

Speech Emotion Recognition Using Dual-Stream Representation and Cross-Attention Fusion
Shaode Yu ... Hang Yu
Electronics | VOL. 13
Shaode Yu, et. al.Shaode Yu ... Hang Yu
04 Jun 2024
Electronics | VOL. 13

An Emotion Recognition Method Based On Feature Fusion and Self-Supervised Learning
Xuanmeng Cao ... Ming Sun
-
Xuanmeng Cao, et. al.Xuanmeng Cao ... Ming Sun
17 Mar 2023
17 Mar 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture

Abstract

Talk to us

Similar Papers

More From: Applied Artificial Intelligence