Abstract

Medical visual question answering (Med-VQA) aims to accurately answer a clinical question presented with a medical image. Despite its enormous potential in healthcare services, the development of this technology is still in the initial stage. On the one hand, Med-VQA tasks are highly challenging due to the massive diversity of clinical questions that require different visual reasoning skills for different types of questions. On the other hand, medical images are complex in nature and very different from natural images, while current Med-VQA datasets are small-scale with a few hundred radiology images, making it difficult to train a well-performing visual feature extractor. This paper addresses above two critical issues. We propose a novel conditional reasoning mechanism with a question-conditioned reasoning component and a type-conditioned reasoning strategy to learn effective reasoning skills for different Med-VQA tasks adaptively. Further, we propose to pre-train a visual feature extractor for Med-VQA via contrastive learning on large amounts of unlabeled radiology images. The effectiveness of our proposals is validated by extensive experiments on existing Med-VQA benchmarks, which show significant improvement of our model in prediction accuracy over state-of-the-art methods. Our source code and pre-training dataset are released at https://github.com/Awenbocc/CPCR.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call