Abstract

Recently, visual question answering (VQA) system is being used in various research disciplines for extracting meaningful data from images and communicating this meaningful data to humans. Thus, to implement VQA system, computer vision and natural language processing fields are combined. Existing VQA system contains many open research issues such as improper count of occluded object problem, single word answer, time-specific answer problem and many more problems because of its wide assortment of utilizations and its more extensive region of research. In this paper, we present attention-based visual question answering (A-VQA) method for handling improper count of occluded object. A-VQA systems generate the textual answer by extracting image and textual features and applying multi-layer attention mechanism on these images and textual features. A-VQA system handle object recognition, counting, color and activity recognition types of visual questions. For training and evaluating A-VQA method, visual genome dataset is used.KeywordsNatural language processingComputer visionVisual question answeringAttention modelYOLOv3LSTMObject detectionOcclusion

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call