Visual-Texual Emotion Analysis With Deep Coupled Video and Danmu Neural Networks

Chenchen Li,Miao Zhao,Jialin Wang,Hongwei Wang,Xiaotie Deng,Wenjie Li

doi:10.1109/tmm.2019.2946477

Abstract

User emotion analysis toward videos is to automatically recognize the general emotional status of viewers from the multimedia content embedded in the online video stream. Existing works fall into two categories: 1) visual-based methods, which focus on visual content and extract a specific set of features of videos. However, it is generally hard to learn a mapping function from low-level video pixels to high-level emotion space due to great intra-class variance. 2) textual-based methods, which focus on the investigation of user-generated comments associated with videos. The learned word representations by traditional linguistic approaches typically lack emotion information and the global comments usually reflect viewers’ high-level understandings rather than instantaneous emotions. To address these limitations, in this paper, we propose to jointly utilize video content and user-generated texts simultaneously for emotion analysis. In particular, we introduce exploiting a new type of user-generated texts, i.e., “danmu,” which are real-time comments floating on the video and contain rich information to convey viewers’ emotional opinions. To enhance the emotion discriminativeness of words in textual feature extraction, we propose Emotional Word Embedding (EWE) to learn text representations by jointly considering their semantics and emotions. Afterward, we propose a novel visual-textual emotion analysis model with Deep Coupled Video and Danmu Neural networks (DCVDN), in which visual and textual features are synchronously extracted and fused to form a comprehensive representation by deep-canonically-correlated-autoencoder-based multi-view learning. Through extensive experiments on a self-crawled real-world video-danmu dataset, we prove that DCVDN significantly outperforms the state-of-the-art baselines.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Visual-Texual Emotion Analysis With Deep Coupled Video and Danmu Neural Networks

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia

Lead the way for us

Journal: IEEE Transactions on Multimedia	Publication Date: Oct 18, 2019
Citations: 66

Similar Papers

Joint approaches for learning word representations from text corpora and knowledge bases

-

30 Mar 2020
30 Mar 2020

Classify social image by integrating multi-modal content
Xiaoming Zhang ... Xiong Li
Multimedia Tools and Applications | VOL. 77
Xiaoming Zhang, et. al.Xiaoming Zhang ... Xiong Li
13 Apr 2017
Multimedia Tools and Applications | VOL. 77

Multi-modal learning for social image classification
Chunyang Liu ... Xiaoming Zhang
-
Chunyang Liu, et. al.Chunyang Liu ... Xiaoming Zhang
01 Aug 2016
01 Aug 2016

Biomedical Text Classification Using Augmented Word Representation Based on Distributional and Relational Contexts.
Md Aslam Parwez ... Md Rabiul Auwul
Computational intelligence and neuroscience | VOL. 2023
Md Aslam Parwez, et. al.Md Aslam Parwez ... Md Rabiul Auwul
01 Jan 2023
Computational intelligence and neuroscience | VOL. 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Visual-Texual Emotion Analysis With Deep Coupled Video and Danmu Neural Networks

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia