Abstract

Radiology images are an essential part of clinical decision making and population screening, e.g., for cancer. Automated systems could help clinicians cope with large amounts of images by answering questions about the image contents. An emerging area of artificial intelligence, Visual Question Answering (VQA) in the medical domain explores approaches to this form of clinical decision support. Success of such machine learning tools hinges on availability and design of collections composed of medical images augmented with question-answer pairs directed at the content of the image. We introduce VQA-RAD, the first manually constructed dataset where clinicians asked naturally occurring questions about radiology images and provided reference answers. Manual categorization of images and questions provides insight into clinically relevant tasks and the natural language to phrase them. Evaluating with well-known algorithms, we demonstrate the rich quality of this dataset over other automatically constructed ones. We propose VQA-RAD to encourage the community to design VQA tools with the goals of improving patient care.

Highlights

  • Background & SummaryVisual question answering (VQA) is a computer vision and artificial intelligence (AI) problem that aims to answer questions about images

  • Many different techniques are applied to build Visual Question Answering (VQA) systems including computer vision, natural language processing, and deep learning. These systems need to be trained for the task and evaluated on large data collections consisting of images and pairs of questions asked about the images with corresponding answers

  • There has been great progress in image recognition in radiology[1], the datasets that allowed this are not quite generalizable to VQA because none of the datasets have question-answer pairs directed at the images[2,3]

Read more

Summary

Background & Summary

Visual question answering (VQA) is a computer vision and artificial intelligence (AI) problem that aims to answer questions about images. Many different techniques are applied to build VQA systems including computer vision, natural language processing, and deep learning These systems need to be trained for the task and evaluated on large data collections consisting of images and pairs of questions asked about the images with corresponding answers. To overcome the lack of readily available natural visual questions, questions and answers were automatically generated from corresponding captions This resulted in many artificial questions that do not always make sense, to the point where a human could not reason what the questions were trying to ask. Another issue with the dataset is that images were automatically captured from PubMed Central articles. We demonstrate the value of VQA-RAD and use cases by applying several well-known algorithms

Image Selection
Organ System
Answer Type
Data Records
Technical Validation Analysis of Questions and Answers
Author Contributions
Findings
Additional Information
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call