Open-Ended Video Question Answering via Multi-Modal Conditional Adversarial Networks.

Zhou Zhao,Zehan Song,Shuwen Xiao,Yueting Zhuang,Chujie Lu,Jun Xiao

doi:10.1109/tip.2020.2963950

Abstract

As a challenging task in visual information retrieval, open-ended long-form video question answering automatically generates the natural language answer from the referenced video content according to the given question. However, the existing video question answering works mainly focus on the short-form video, which may be ineffectively applied for long-form video question answering directly, due to the insufficiency of modeling the semantic representation of long-form video content. In this paper, we study the problem of open-ended long-form video question answering from the viewpoint of hierarchical multimodal conditional adversarial network learning. We propose the hierarchical attentional encoder network to learn the joint representation of long-form video content and given question with adaptive video segmentation. We then devise the reinforced decoder network to generate the natural language answer for openended video question answering with multi-modal conditional adversarial network learning. We construct three large-scale open-ended video question answering datasets. The extensive experiments validate the effectiveness of our method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Open-Ended Video Question Answering via Multi-Modal Conditional Adversarial Networks.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Image Processing

Lead the way for us

Journal: IEEE Transactions on Image Processing	Publication Date: Jan 1, 2020
Citations: 62

Similar Papers

Open-Ended Long-form Video Question Answering via Adaptive Hierarchical Reinforced Networks
Zhou Zhao ... Jun Yu
-
Zhou Zhao, et. al.Zhou Zhao ... Jun Yu
01 Jul 2018
01 Jul 2018

Multi-Turn Video Question Answering via Multi-Stream Hierarchical Attention Context Network
Zhou Zhao ... Shiliang Pu
-
Zhou Zhao, et. al.Zhou Zhao ... Shiliang Pu
01 Jul 2018
01 Jul 2018

Video Question Answering via Hierarchical Spatio-Temporal Attention Networks
Zhou Zhao ... Deng Cai
-
Zhou Zhao, et. al.Zhou Zhao ... Deng Cai
01 Aug 2017
01 Aug 2017

Long-Form Video Question Answering via Dynamic Hierarchical Reinforced Networks.
Zhou Zhao ... Zhenxin Xiao
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society | VOL. 28
Zhou Zhao, et. al.Zhou Zhao ... Zhenxin Xiao
17 Jun 2019
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Open-Ended Video Question Answering via Multi-Modal Conditional Adversarial Networks.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Image Processing