Abstract

An approach for visual question answering is described. The proposed network solves an open-ended problem with candidate answer recommendation, which is generated solely from the given question. Then, the score from the proposed question-aware prediction module and the score from candidate answer recommendation module are combined to determine the final composite score. The proposed approach uses the bag-of-words framework to understand questions, instead of a complex and neural-network-based module; therefore, an additional dataset to pre-train the language model is not required. Although the proposed approach does not achieve the state-of-the-art performance overall, the approach performs the best for certain types of questions with a small amount of training data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call