Efficient Spatial Temporal Convolutional Features for Audiovisual Continuous Affect Recognition

Haifeng Chen,Yifan Deng,Yixuan Wang,Hichem Sahli,Shiwen Cheng,Dongmei Jiang

doi:10.1145/3347320.3357690

Abstract

Affective dimension prediction from multi-modal is becoming an increasingly attractive research field in artificial intelligence (AI) and human-computer interaction (HCI) . Previous works have shown that discriminative features from multiple modalities are of importance to accurately recognize emotional states. Recently, deep representations have proved to be effective for emotional state recognition. To investigate new deep spatial-temporal features and evaluate their effectiveness for affective dimension recognition, in this paper, we propose:~(1) combining a pre-trained 2D-CNN and a 1D-CNN for learning deep spatial-temporal features from video images and audio spectrograms; and~(2) a spatial-Temporal Graph Convolutional Networks (ST-GCN) adapted to facial landmarks graph. To evaluate the effectiveness of the proposed spatial-temporal features for affective dimension prediction, we propose Deep Bidirectional Long Short-Term Memory Networks (DBLSTM) model for single-modality prediction, early-fusion and late-fusion predictions. With respect to the liking dimension, we use the text modality for prediction. Experimental results, on the AVEC2019 CES dataset, show that our proposed spatial-temporal features and recognition model obtain promising results. On the development set, the obtained concordance correlation coefficient (CCC) is up to $0.724$ for arousal and $0.705$ for valence, and on the test set, the CCC is $0.513$ for arousal and $0.515$ for valence, which outperform the baseline system with corresponding CCC of $0.355$ and $0.468$ on arousal and valence, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient Spatial Temporal Convolutional Features for Audiovisual Continuous Affect Recognition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Spatial Temporal Variation Graph Convolutional Networks (STV-GCN) for Skeleton-Based Emotional Action Recognition
Ming-Fong Tsai ... Chiung-Hung Chen
IEEE Access | VOL. 9
Ming-Fong Tsai, et. al.Ming-Fong Tsai ... Chiung-Hung Chen
01 Jan 2020
IEEE Access | VOL. 9

Enhanced Spatial and Extended Temporal Graph Convolutional Network for Skeleton-Based Action Recognition.
Fanjia Li ... Juanjuan Li
Sensors | VOL. 20
Fanjia Li, et. al.Fanjia Li ... Juanjuan Li
15 Sep 2020
Sensors | VOL. 20

Long Short-Term Fusion Spatial-Temporal Graph Convolutional Networks for Traffic Flow Forecasting
Hui Zeng ... Junyang Wang
Electronics | VOL. 12
Hui Zeng, et. al.Hui Zeng ... Junyang Wang
03 Jan 2023
Electronics | VOL. 12

Risk Prediction of Theft Crimes in Urban Communities: An Integrated Model of LSTM and ST-GCN
Xinge Han ... Xiaofeng Hu
IEEE Access | VOL. 8
Xinge Han, et. al.Xinge Han ... Xiaofeng Hu
01 Jan 2020
IEEE Access | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Spatial Temporal Convolutional Features for Audiovisual Continuous Affect Recognition

Abstract

Talk to us

Similar Papers