Visual prior-based cross-modal alignment network for radiology report generation

Sheng Zhang,Chuan Zhou,Leiting Chen,Zhiheng Li,Yuan Gao,Yongqi Chen

doi:10.1016/j.compbiomed.2023.107522

Abstract

Automated radiology report generation is gaining popularity as a means to alleviate the workload of radiologists and prevent misdiagnosis and missed diagnoses. By imitating the working patterns of radiologists, previous report generation approaches have achieved remarkable performance. However, these approaches suffer from two significant problems: (1) lack of visual prior: medical observations in radiology images are interdependent and exhibit certain patterns, and lack of such visual prior can result in reduced accuracy in identifying abnormal regions; (2) lack of alignment between images and texts: the absence of annotations and alignments for regions of interest in the radiology images and reports can lead to inconsistent visual and textual features of the abnormal regions generated by the model. To address these issues, we propose a Visual Prior-based Cross-modal Alignment Network for radiology report generation. First, we propose a novel Contrastive Attention that compares input image with normal images to extract difference information, namely visual prior, which helps to identify abnormalities quickly. Then, to facilitate the alignment of images and texts, we propose a Cross-modal Alignment Network that leverages the cross-modal matrix initialized by the features generated by pre-trained models, to compute cross-modal responses for visual and textual features. Finally, a Visual Prior-guided Multi-Head Attention is proposed to incorporate the visual prior into the generation process. The extensive experimental results on two benchmark datasets, IU-Xray and MIMIC-CXR, illustrate that our proposed model outperforms the state-of-the-art models over almost all metrics, achieving BLEU-4 scores of 0.188 and 0.116 and CIDEr scores of 0.409 and 0.240, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Visual prior-based cross-modal alignment network for radiology report generation

Abstract

Talk to us

Similar Papers

More From: Computers in Biology and Medicine

Lead the way for us

Similar Papers

Chest radiology report generation based on cross-modal multi-scale feature fusion
Yu Pan ... Qing-Song Huang
Journal of Radiation Research and Applied Sciences | VOL. 17
Yu Pan, et. al.Yu Pan ... Qing-Song Huang
13 Jan 2024
Journal of Radiation Research and Applied Sciences | VOL. 17

Bootstrapping Large Language Models for Radiology Report Generation
Chang Liu ... Yongdong Zhang
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Chang Liu, et. al.Chang Liu ... Yongdong Zhang
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

S3-Net: A Self-Supervised Dual-Stream Network for Radiology Report Generation.
Renjie Pan ... Shaoguo Cui
IEEE Journal of Biomedical and Health Informatics | VOL. 28
Renjie Pan, et. al.Renjie Pan ... Shaoguo Cui
01 Mar 2024
IEEE Journal of Biomedical and Health Informatics | VOL. 28

Interactive dual-stream contrastive learning for radiology report generation
Ziqi Zhang ... Ailian Jiang
Journal of Biomedical Informatics | VOL. 157
Ziqi Zhang, et. al.Ziqi Zhang ... Ailian Jiang
28 Aug 2024
Journal of Biomedical Informatics | VOL. 157

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Visual prior-based cross-modal alignment network for radiology report generation

Abstract

Talk to us

Similar Papers

More From: Computers in Biology and Medicine