Abstract

With the application of deep learning method in the field of image processing, the image-related intelligent interaction technology has also been rapidly developed. Visual question answering (VQA) collects the image information by asking questions related to the image and ultimately achieves the purpose for enriching the image understanding. Vision and language are the two core parts of human intelligence to understand the real world, and also the basic components to realize artificial intelligence, and a lot of research has been carried out in their respective fields. With the continuous promotion and application of deep learning in the fields of computer vision and natural language processing, visual question answering technology across the visual field and natural language disciplines has become a research hotspot in recent years. Visual question answering (VQA) for intelligent interaction collects image information by asking relevant questions to the content of the image and finally achieves the purpose of enriching image understanding. At the same time, as an emerging research direction, the challenges faced by the visual question answering system are huge, and we need to learn and excavate. Through the comprehensive comparison and analysis of the existing models and methods of visual question answering, this paper summarizes the shortcomings and development directions of the current research work and analyzes several models of visual question answering technology for the processing of image input and question input of the visual question answering model. The working principle of the model and the common public data set of the model: it is concluded that extending the structured knowledge base and applying mature technologies such as text question answering and natural language processing to deal with VQA problems are the future development directions of the VQA model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call