A joint hierarchical cross‐attention graph convolutional network for multi‐modal facial expression recognition

Chujie Xu,Tiejun Li,Zhansheng Yuan,Wenjie Zheng,Yong Du,Jingzi Wang

doi:10.1111/coin.12607

Abstract

AbstractEmotional recognition in conversations (ERC) is increasingly being applied in various IoT devices. Deep learning‐based multimodal ERC has achieved great success by leveraging diverse and complementary modalities. Although most existing methods try to adopt attention mechanisms to fuse different information, these methods ignore the complementarity between modalities. To this end, the joint cross‐attention model is introduced to alleviate this issue. However, multi‐scale feature information on different modalities is not utilized. Moreover, the context relationship plays an important role in feature extraction in the expression recognition task. In this paper, we propose a novel joint hierarchical graph convolution network (JHGCN) which exploits different layer features and context relationships for facial expression recognition based on audio‐visual (A‐V) information. Specifically, we adopt different deep networks to extract features from different modalities individually. For V modality, we construct V graph data based on patch embeddings which are extracted from the transformer encoder. Moreover, we embed the graph convolution which can leverage the intra‐modality relationships with the transformer encoder. Then, the deep feature from different layers is fed to the hierarchical fusion module to enhance feature representation. At last, we use the joint cross‐attention mechanism to exploit the complementary inter‐modality relationships. To validate the proposed model, we have conducted various experiments on the AffWild2 and CMU‐MOSI datasets. All results confirm that our proposed model achieves highly promising performance compared to the joint cross‐attention model and other methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A joint hierarchical cross‐attention graph convolutional network for multi‐modal facial expression recognition

Abstract

Talk to us

Similar Papers

More From: Computational Intelligence

Lead the way for us

Journal: Computational Intelligence	Publication Date: Oct 25, 2023
Citations: 2

Similar Papers

Facial Expression and Sex Recognition in Schizophrenia and Depression
Benoit Bediou ... Marie-Anne Henaff
The Canadian Journal of Psychiatry | VOL. 50
Benoit Bediou, et. al.Benoit Bediou ... Marie-Anne Henaff
01 Aug 2005
The Canadian Journal of Psychiatry | VOL. 50

Adaptive graph convolutional collaboration networks for semi-supervised classification
Sichao Fu ... Xiao-Yuan Jing
Information Sciences | VOL. 611
Sichao Fu, et. al.Sichao Fu ... Xiao-Yuan Jing
17 Aug 2022
Information Sciences | VOL. 611

A Graph Convolutional Stacked Temporal Attention Neural Network for Traffic Flow Forecasting
Yushan Feng ... Fengxia Han
-
Yushan Feng, et. al.Yushan Feng ... Fengxia Han
18 Jul 2022
18 Jul 2022

Facial expression recognition in facial occlusion scenarios: A path selection multi-network
Liheng Ruan ... Jiaqi Li
Displays | VOL. 74
Liheng Ruan, et. al.Liheng Ruan ... Jiaqi Li
07 Jun 2022
Displays | VOL. 74

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A joint hierarchical cross‐attention graph convolutional network for multi‐modal facial expression recognition

Abstract

Talk to us

Similar Papers

More From: Computational Intelligence