VMEKNet: Visual Memory and External Knowledge Based Network for Medical Report Generation

Weipeng Chen,Xin Du,Qianna Cui,Kejia Zhang,Haiwei Pan

doi:10.1007/978-3-031-20862-1_14

Abstract

AbstractThe main purpose of the medical report generation task is to generate a medical report corresponding to a given medical image, which contains detailed information of body parts and diagnostic results from radiologists. The task not only greatly reduces the workload of radiologists, but also helps patients get medical treatment in time. However, there are still many limitations in this task. First, the gap between image semantic features and text semantic features hinders the accuracy of the generated medical reports. Second, there are a large number of similar features in different medical images, which are not utilized efficiently and adequately. In order to solve the problems mentioned above, we propose a medical report generation model VMEKNet that integrates visual memory and external knowledge into the task. Specifically, we propose two novel modules and introduce them into medical report generation. Among them, the TF-IDF Embedding (TIE) module incorporates external knowledge into the feature extraction stage via the TF-IDF algorithm, and the Visual Memory (VIM) module makes full use of previous image features to help the model extract more accurate medical image features. After that, a standard Transformer processes the image features and text features then generates full medical reports. Experimental results on benchmark datasets, IU X-Ray, have demonstrated that our proposed model outperforms previous works on both natural language generation metrics and practical clinical diagnosis.KeywordsMedical report generationTransformerTF-IDF algorithmVisual memory

Full Text