Visual question answering with gated relation‐aware auxiliary

Xiangjun Shao,Yuanxiang Li,Zhenglong Xiang

doi:10.1049/ipr2.12421

Abstract

The great advances in computer vision and natural language processing make significant progress in visual question answering. In the visual question answering task, the visual representation is essential for understanding the image content. However, traditional methods rarely exploit the context information of the visual feature related to the question and the relation-aware information to capture valuable visual representation. Therefore, a gated relation-aware model is proposed to capture the enhanced visual representation for desiring answer prediction. The gated relation-aware module can learn relation-aware information between the visual feature and the context, and a certain object of an image, respectively. In addition, the proposed module can filter out the unnecessary relation-aware information through the gate guided by the question semantic representation. The results of the conducted experiments show that the gated relation-aware module makes a significant improvement on all answer categories.

Full Text