A Multimodal Deep Regression Bayesian Network for Affective Video Content Analyses

Quan Gan,Shangfei Wang,Qiang Ji,Longfei Hao

doi:10.1109/iccv.2017.547

Abstract

The inherent dependencies between visual elements and aural elements are crucial for affective video content analyses, yet have not been successfully exploited. Therefore, we propose a multimodal deep regression Bayesian network (MMDRBN) to capture the dependencies between visual elements and aural elements for affective video content analyses. The regression Bayesian network (RBN) is a directed graphical model consisting of one latent layer and one visible layer. Due to the explaining away effect in Bayesian networks (BN), RBN is able to capture both the dependencies among the latent variables given the observation and the dependencies among visible variables. We propose a fast learning algorithm to learn the RBN. For the MMDRB-N, first, we learn several RBNs layer-wisely from visual modality and audio modality respectively. Then we stack these RBNs and obtain two deep networks. After that, a joint representation is extracted from the top layers of the two deep networks, and thus captures the high order dependencies between visual modality and audio modality. In order to predict the valence or arousal score of video contents, we initialize a feed-forward inference network from the MMDRBN whose inference is intractable by minimizing the KullbackCLeibler (KL)divergence between the two networks. The back propagation algorithm is adopted for finetuning the inference network. Experimental results on the LIRIS-ACCEDE database demonstrate that the proposed MMDRBN successfully captures the dependencies between visual and audio elements, and thus achieves better performance compared with state-of-the-art work.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Multimodal Deep Regression Bayesian Network for Affective Video Content Analyses

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Knowledge-Augmented Multimodal Deep Regression Bayesian Networks for Emotion Video Tagging
Shangfei Wang ... Qiang Ji
IEEE Transactions on Multimedia | VOL. 22
Shangfei Wang, et. al.Shangfei Wang ... Qiang Ji
22 Aug 2019
IEEE Transactions on Multimedia | VOL. 22

The effects of aural and visual factors on appropriateness ratings of residential spaces in an urban city.
Johann Kay Ann Tan ... Yoshimi Hasegawa
INTER-NOISE and NOISE-CON Congress and Conference Proceedings | VOL. 263
Johann Kay Ann Tan, et. al.Johann Kay Ann Tan ... Yoshimi Hasegawa
01 Aug 2021
INTER-NOISE and NOISE-CON Congress and Conference Proceedings | VOL. 263

It's not all about the words: Non-textual information in World of Warcraft
Whippey Caroline
Proceedings of the American Society for Information Science and Technology | VOL. 48
Whippey CarolineWhippey Caroline
01 Jan 2010
Proceedings of the American Society for Information Science and Technology | VOL. 48

Facial Action Unit Recognition Augmented by Their Dependencies
Longfei Hao ... Guozhu Peng
-
Longfei Hao, et. al.Longfei Hao ... Guozhu Peng
01 May 2018
01 May 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Multimodal Deep Regression Bayesian Network for Affective Video Content Analyses

Abstract

Talk to us

Similar Papers