Abstract

Visual Question Answering (VQA) systems have achieved great success in general scenarios. In medical domain, VQA systems are still in their infancy as the datasets are limited by scale and application scenarios. Current medical VQA datasets are designed to conduct basic analyses of medical imaging such as modalities, planes, organ systems, abnormalities, etc., aiming to provide constructive medical suggestions for doctors, containing a large number of professional terms with limited help for patients. In this paper, we introduce a new Patient-oriented Visual Question Answering (P-VQA) dataset, which builds a VQA system for patients by covering an entire treatment process including medical consultation, imaging diagnosis, clinical diagnosis, treatment advice, review, etc. P-VQA covers 20 common diseases with 2,169 medical images, 24,800 question-answering pairs, and a medical knowledge graph containing 419 entities. In terms of methodology, we propose a Medical Knowledge-based VQA Network (MKBN) to answer questions according to the images and a medical knowledge graph in our P-VQA. MKBN learns two cluster embeddings (disease-related and relation-related embeddings) according to structural characteristics of the medical knowledge graph and learns three different interactive features (image-question, image-disease, and question-relation) according to characteristics of diagnosis. For comparisons, we evaluate several state-of-the-art baselines on the P-VQA dataset as benchmarks. Experimental results on P-VQA demonstrate that MKBN achieves the state-of-the-art performance compared with baseline methods. The dataset is available at https://github.com/cs-jerhuang/P-VQA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call