Visual question generation for explicit questioning purposes based on target objects

Jiayuan Xie,Jiali Chen,Wenhao Fang,Yi Cai,Qing Li

doi:10.1016/j.neunet.2023.08.007

Abstract

Visual question generation aims to focus on some target objects in an image to generate questions with certain questioning purposes. Existing studies mainly utilize an answer to extract the target object corresponding to the questioning purpose for questioning. However, answers fail to accurately and completely map to every target object, such as the objects corresponding to the answer are ambiguous or the answers are the relationship between multiple objects. To address this problem, we propose a content-controlled question generation model, which generates questions based on a given target object set specified from an image. Considering that the target objects have different contributions during the generation process, we design a recurrent generative architecture to explicitly control attention to different objects and their corresponding image information at each generative stage. Extensive experiments on the VQA v2.0 dataset and the Visual7w dataset show that the proposed model outperforms the state-of-the-art models and can controllably generate questions with specified content.

Full Text