Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports

Hong-Yu Zhou,Yizhou Yu,Yinghao Zhang,Ruibang Luo,Liansheng Wang,Xiaoyu Chen

doi:10.1038/s42256-021-00425-9

Hong-Yu Zhou, Yizhou Yu + Show 4 more

Open Access

https://doi.org/10.1038/s42256-021-00425-9

Copy DOI

Abstract

Pre-training lays the foundation for recent successes in radiograph analysis supported by deep learning. It learns transferable image representations by conducting large-scale fully- or self-supervised learning on a source domain; however, supervised pre-training requires a complex and labour-intensive two-stage human-assisted annotation process, whereas self-supervised learning cannot compete with the supervised paradigm. To tackle these issues, we propose a cross-supervised methodology called reviewing free-text reports for supervision (REFERS), which acquires free supervision signals from the original radiology reports accompanying the radiographs. The proposed approach employs a vision transformer and is designed to learn joint representations from multiple views within every patient study. REFERS outperforms its transfer learning and self-supervised learning counterparts on four well-known X-ray datasets under extremely limited supervision. Moreover, REFERS even surpasses methods based on a source domain of radiographs with human-assisted structured labels; it therefore has the potential to replace canonical pre-training methodologies. To train machine learning models for medical imaging, large amounts of training data are needed. Zhou and colleagues instead propose a method of weak supervision which uses the information of radiology reports to learn visual features without the need for expert labelling.

Full Text