The artificial intelligence-powered computational pathology has led to significant improvements in the speed and precision of tumor diagnosis, while also exhibiting substantial potential to infer genetic mutations and gene expression levels. However, current studies remain limited in predicting molecular subtypes and clinical outcomes in breast cancer. In this paper, we proposed a weakly supervised contrastive learning framework to address this challenge. Our framework first performed contrastive learning pretraining on a large number of unlabeled patches tiled from whole slide images (WSIs) to extract patch-level features. The gated attention mechanism was leveraged to aggregate patch-level features to produce slide feature that was then applied to various downstream tasks. To confirm the effectiveness of the proposed method, three public cohorts and one external independent cohort of breast cancer have been used to conducted evaluation experiments. The predictive powers of our model to infer gene expression, molecular subtypes, recurrence events and drug responses were validated across cohorts. In addition, the learned patch-level attention scores enabled us to generate heatmaps that were highly consistent with pathologist annotations and spatial transcriptomic data. These findings demonstrated that our model effectively established the high-order genotype-phenotype associations, thereby potentially extend the application of digital pathology in clinical practice.
Read full abstract