Abstract

The counting-based questions play a major part in Visual Question Answering (VQA), the most challenging factor is counting the different objects present in the images. Recently more attention is paid to design a model of count-aided VQA. Based on the questions, the VQA system responds with appropriate answers. Yet, the complex questions are necessitating in the system with answers. The earlier models are still facing the challenging problems of counting the various objects within the images as the models become futile to select the features and lack fine-grained representation. In order to sustain the image representation, this paper proposes a new model for VQA using the heuristic approach of serial cascaded deep learning methods. Initially, the standard data regarding images and text data are gathered and fed to the pre-processing process. Consequently, the feature extraction is done on both the image and the text data. Here, the deep features from images are taken using Visual Geometry Group 16 (VGG16) and the text features are extracted using Text Convolutional Neural Network (TCNN). Then, the optimal weighted fused features are obtained, where the weights used for getting the necessary features are tuned via the Improved Tuna Swarm Optimization (ITSO) algorithm. Finally, the counting answers are retrieved based on the given queries, which is carried out via Serial Cascaded Recurrent Neural Network with Attention Mechanism-based Long Short-Term Memory (SCRAM-LSTM). The performance is examined with divergent metrics compared with conventional models. Hence, the findings reveal that it offers superior performance in estimating the appropriate answers. Therefore, the proposed work is widely used for such potential applications as helping blind or visually impaired people to get information, integrating with image retrieval systems, and also for search engines. Especially, it is utilized for the vision and language systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call