Multi-Turn Video Question Answering via Hierarchical Attention Context Reinforced Networks.

Zhou Zhao,Xinghua Jiang,Deng Cai,Zhu Zhang

doi:10.1109/tip.2019.2902106

Abstract

Multi-turn video question answering is a challenging task in visual information retrieval, which generates the accurate answer from the referenced video contents according to the visual conversation context and given question. However, the existing visual question answering methods mainly tackle the problem of single-turn video question answering, which may be ineffectively applied for multi-turn video question answering directly, due to the insufficiency of modeling the sequential conversation context. In this paper, we study the problem of multi-turn video question answering from the viewpoint of multi-stream hierarchical attention context reinforced network learning. We first propose the hierarchical attention context network for context-aware question understanding by modeling the hierarchically sequential conversation context structure. We then develop the multi-stream spatio-temporal attention network for learning the joint representation of the dynamic video contents and context-aware question embedding. We next devise a multi-step reasoning process to enhance the multi-stream hierarchical attention context network learning method. We finally predict the multiple-choice answer from the candidate answer set and further develop the reinforced decoder network to generate the open-ended natural language answer for multi-turn video question answering. We construct two large-scale multi-turn video question answering datasets. The extensive experiments show the effectiveness of our method.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-Turn Video Question Answering via Hierarchical Attention Context Reinforced Networks.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Lead the way for us

Journal: IEEE transactions on image processing : a publication of the IEEE Signal Processing Society	Publication Date: Feb 27, 2019
Citations: 74

Similar Papers

Multi-Turn Video Question Answering via Multi-Stream Hierarchical Attention Context Network
Zhou Zhao ... Shiliang Pu
-
Zhou Zhao, et. al.Zhou Zhao ... Shiliang Pu
01 Jul 2018
01 Jul 2018

Open-Ended Video Question Answering via Multi-Modal Conditional Adversarial Networks.
Zhou Zhao ... Jun Xiao
IEEE Transactions on Image Processing | VOL. 29
Zhou Zhao, et. al.Zhou Zhao ... Jun Xiao
01 Jan 2020
IEEE Transactions on Image Processing | VOL. 29

Video Question Answering via Hierarchical Spatio-Temporal Attention Networks
Zhou Zhao ... Deng Cai
-
Zhou Zhao, et. al.Zhou Zhao ... Deng Cai
01 Aug 2017
01 Aug 2017

Open-Ended Long-form Video Question Answering via Adaptive Hierarchical Reinforced Networks
Zhou Zhao ... Jun Yu
-
Zhou Zhao, et. al.Zhou Zhao ... Jun Yu
01 Jul 2018
01 Jul 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-Turn Video Question Answering via Hierarchical Attention Context Reinforced Networks.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on image processing : a publication of the IEEE Signal Processing Society