Analyzing audiovisual data for understanding user's emotion in human−computer interaction environment

Juan Yang,Zhenkun Li,Xu Du

doi:10.1108/dta-08-2023-0414

Abstract

PurposeAlthough numerous signal modalities are available for emotion recognition, audio and visual modalities are the most common and predominant forms for human beings to express their emotional states in daily communication. Therefore, how to achieve automatic and accurate audiovisual emotion recognition is significantly important for developing engaging and empathetic human–computer interaction environment. However, two major challenges exist in the field of audiovisual emotion recognition: (1) how to effectively capture representations of each single modality and eliminate redundant features and (2) how to efficiently integrate information from these two modalities to generate discriminative representations.Design/methodology/approachA novel key-frame extraction-based attention fusion network (KE-AFN) is proposed for audiovisual emotion recognition. KE-AFN attempts to integrate key-frame extraction with multimodal interaction and fusion to enhance audiovisual representations and reduce redundant computation, filling the research gaps of existing approaches. Specifically, the local maximum–based content analysis is designed to extract key-frames from videos for the purpose of eliminating data redundancy. Two modules, including “Multi-head Attention-based Intra-modality Interaction Module” and “Multi-head Attention-based Cross-modality Interaction Module”, are proposed to mine and capture intra- and cross-modality interactions for further reducing data redundancy and producing more powerful multimodal representations.FindingsExtensive experiments on two benchmark datasets (i.e. RAVDESS and CMU-MOSEI) demonstrate the effectiveness and rationality of KE-AFN. Specifically, (1) KE-AFN is superior to state-of-the-art baselines for audiovisual emotion recognition. (2) Exploring the supplementary and complementary information of different modalities can provide more emotional clues for better emotion recognition. (3) The proposed key-frame extraction strategy can enhance the performance by more than 2.79 per cent on accuracy. (4) Both exploring intra- and cross-modality interactions and employing attention-based audiovisual fusion can lead to better prediction performance.Originality/valueThe proposed KE-AFN can support the development of engaging and empathetic human–computer interaction environment.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Analyzing audiovisual data for understanding user's emotion in human−computer interaction environment

Abstract

Talk to us

Similar Papers

More From: Data Technologies and Applications

Lead the way for us

Journal: Data Technologies and Applications	Publication Date: Nov 1, 2023
Citations: 1

Similar Papers

SMIN: Semi-Supervised Multi-Modal Interaction Network for Conversational Emotion Recognition
Zheng Lian ... Jianhua Tao
IEEE Transactions on Affective Computing | VOL. 14
Zheng Lian, et. al.Zheng Lian ... Jianhua Tao
01 Jul 2023
IEEE Transactions on Affective Computing | VOL. 14

Data Augmentation for Audio-Visual Emotion Recognition with an Efficient Multimodal Conditional GAN
Fei Ma ... Shao-Lun Huang
Applied Sciences | VOL. 12
Fei Ma, et. al.Fei Ma ... Shao-Lun Huang
05 Jan 2022
Applied Sciences | VOL. 12

Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge
...
-
, et. al. ...
23 Oct 2017
23 Oct 2017

Learning Better Representations for Audio-Visual Emotion Recognition with Common Information
Fei Ma ... Wei Zhang
Applied Sciences | VOL. 10
Fei Ma, et. al.Fei Ma ... Wei Zhang
16 Oct 2020
Applied Sciences | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Analyzing audiovisual data for understanding user's emotion in human−computer interaction environment

Abstract

Talk to us

Similar Papers

More From: Data Technologies and Applications