Integrating audio and visual modalities for multimodal personality trait recognition via hybrid deep learning.

Xiaoming Zhao,Zhiwei Tang,Xin Tao,Guoyu Wang,Yicheng Xu,Hongsheng Lu,Dandan Wang,Yuehui Liao

doi:10.3389/fnins.2022.1107284

Abstract

Recently, personality trait recognition, which aims to identify people's first impression behavior data and analyze people's psychological characteristics, has been an interesting and active topic in psychology, affective neuroscience and artificial intelligence. To effectively take advantage of spatio-temporal cues in audio-visual modalities, this paper proposes a new method of multimodal personality trait recognition integrating audio-visual modalities based on a hybrid deep learning framework, which is comprised of convolutional neural networks (CNN), bi-directional long short-term memory network (Bi-LSTM), and the Transformer network. In particular, a pre-trained deep audio CNN model is used to learn high-level segment-level audio features. A pre-trained deep face CNN model is leveraged to separately learn high-level frame-level global scene features and local face features from each frame in dynamic video sequences. Then, these extracted deep audio-visual features are fed into a Bi-LSTM and a Transformer network to individually capture long-term temporal dependency, thereby producing the final global audio and visual features for downstream tasks. Finally, a linear regression method is employed to conduct the single audio-based and visual-based personality trait recognition tasks, followed by a decision-level fusion strategy used for producing the final Big-Five personality scores and interview scores. Experimental results on the public ChaLearn First Impression-V2 personality dataset show the effectiveness of our method, outperforming other used methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Neuroscience	Publication Date: Jan 6, 2023
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Integrating audio and visual modalities for multimodal personality trait recognition via hybrid deep learning.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Neuroscience

Lead the way for us

Similar Papers

Multi-stage glaucoma classification using pre-trained convolutional neural networks and voting-based classifier fusion.
Vijaya Kumar Velpula ... Lakhan Dev Sharma
Frontiers in Physiology | VOL. 14
Vijaya Kumar Velpula, et. al.Vijaya Kumar Velpula ... Lakhan Dev Sharma
13 Jun 2023
Frontiers in Physiology | VOL. 14

Developing a new deep learning CNN model to detect and classify highway cracks
Faris Elghaish ... Aso Hajirasouli
Journal of Engineering, Design and Technology | VOL. 20
Faris Elghaish, et. al.Faris Elghaish ... Aso Hajirasouli
16 Aug 2021
Journal of Engineering, Design and Technology | VOL. 20

Prediction of the superimposed laser shot number for copper using a deep convolutional neural network.
K Rani ... K Hashimoto
Optics Express | VOL. 31
K Rani, et. al.K Rani ... K Hashimoto
07 Jul 2023
Optics Express | VOL. 31

Convolutional neural networks based efficient approach for classification of lung diseases
Fatih Demir ... Varun Bajaj
Health Information Science and Systems | VOL. 8
Fatih Demir, et. al.Fatih Demir ... Varun Bajaj
23 Dec 2019
Health Information Science and Systems | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integrating audio and visual modalities for multimodal personality trait recognition via hybrid deep learning.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Neuroscience