Abstract
The goal of medical visual question answering (Med-VQA) is to correctly answer a clinical question posed by a medical image. Medical images are fundamentally different from images in the general domain. As a result, using general domain Visual Question Answering (VQA) models to the medical domain is impossible. Furthermore, the large-scale data required by VQA models is rarely available in the medical arena. Existing approaches of medical visual question answering often rely on transfer learning with external data to generate good image feature representation and use cross-modal fusion of visual and language features to acclimate to the lack of labelled data. This research provides a new parallel multi-head attention framework (MaMVQA) for dealing with Med-VQA without the use of external data. The proposed framework addresses image feature extraction using the unsupervised Denoising Auto-Encoder (DAE) and language feature extraction using term-weighted question embedding. In addition, we present qf-MI, a unique supervised term-weighting (STW) scheme based on the concept of mutual information (MI) between the word and the corresponding class label. Extensive experimental findings on the VQA-RAD public medical VQA benchmark show that the proposed methodology outperforms previous state-of-the-art methods in terms of accuracy while requiring no external data to train the model. Remarkably, the presented MaMVQA model achieved significantly increased accuracy in predicting answers to both close-ended (78.68%) and open-ended (55.31%) questions. Also, an extensive set of ablations are studied to demonstrate the significance of individual components of the system.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.