Abstract
AbstractThe main purpose of the medical report generation task is to generate a medical report corresponding to a given medical image, which contains detailed information of body parts and diagnostic results from radiologists. The task not only greatly reduces the workload of radiologists, but also helps patients get medical treatment in time. However, there are still many limitations in this task. First, the gap between image semantic features and text semantic features hinders the accuracy of the generated medical reports. Second, there are a large number of similar features in different medical images, which are not utilized efficiently and adequately. In order to solve the problems mentioned above, we propose a medical report generation model VMEKNet that integrates visual memory and external knowledge into the task. Specifically, we propose two novel modules and introduce them into medical report generation. Among them, the TF-IDF Embedding (TIE) module incorporates external knowledge into the feature extraction stage via the TF-IDF algorithm, and the Visual Memory (VIM) module makes full use of previous image features to help the model extract more accurate medical image features. After that, a standard Transformer processes the image features and text features then generates full medical reports. Experimental results on benchmark datasets, IU X-Ray, have demonstrated that our proposed model outperforms previous works on both natural language generation metrics and practical clinical diagnosis.KeywordsMedical report generationTransformerTF-IDF algorithmVisual memory
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.