Abstract

Visual question generation task aims at generating high-quality questions about a given image. To make this tak applicable to various scenarios, e.g., the growing demand for exams, it is important to generate diverse questions. The existing methods for this task control diverse question generation based on different question types, e.g., “what” and “when.” Although different question types lead to description diversity, they cannot guarantee semantic diversity when asking the same objects. Research in the field of psychology shows that humans pay attention to different objects in an image based on their preferences, which is beneficial to constructing semantically diverse questions. According to the research, we propose a multi-selector visual question generation (MS-VQG) model that aims to focus on different objects to generate diverse questions. Specifically, our MS-VQG model employs multiple selectors to imitate different humans to select different objects in a given image. Based on these different selected objects, our MS-VQG model can generate diverse questions corresponding to each selector. Extensive experiments on two datasets show that our proposed model outperforms the baselines in generating diverse questions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call