A visual-language foundation model for computational pathology.

Andrew Zhang,Andrew Zhang,Andrew Zhang,Andrew Zhang,Andrew Zhang,Anil V Parwani,Bowen Chen,Bowen Chen,Drew F K Williamson,Drew F K Williamson,Drew F K Williamson,Faisal Mahmood,Faisal Mahmood,Faisal Mahmood,Faisal Mahmood,Faisal Mahmood,Georg Gerber,Guillaume Jaume,Guillaume Jaume,Guillaume Jaume,Guillaume Jaume,Igor Odintsov,Ivy Liang,Ivy Liang,Long Phi Le,Ming Y Lu,Ming Y Lu,Ming Y Lu,Ming Y Lu,Ming Y Lu,Richard J Chen,Richard J Chen,Richard J Chen,Richard J Chen,Richard J Chen,Tong Ding,Tong Ding

doi:10.1038/s41591-024-02856-4

Abstract

The accelerated adoption of digital pathology and advances in deep learning have enabled the development of robust models for various pathology tasks across a diverse array of diseases and patient cohorts. However, model training is often difficult due to label scarcity in the medical domain, and a model's usage is limited by the specific task and disease for which it is trained. Additionally, most models in histopathology leverage only image data, a stark contrast to how humans teach each other and reason about histopathologic entities. We introduce CONtrastive learning from Captions for Histopathology (CONCH), a visual-language foundation model developed using diverse sources of histopathology images, biomedical text and, notably, over 1.17 million image-caption pairs through task-agnostic pretraining. Evaluated on a suite of 14 diverse benchmarks, CONCH can be transferred to a wide range of downstream tasks involving histopathology images and/or text, achieving state-of-the-art performance on histology image classification, segmentation, captioning, and text-to-image and image-to-text retrieval. CONCH represents a substantial leap over concurrent visual-language pretrained systems for histopathology, with the potential to directly facilitate a wide array of machine learning-based workflows requiring minimal or no further supervised fine-tuning.

Full Text