Abstract Background: The efficacy of antibody-drug conjugates (ADCs) depends on the expression and specificity of the target antigen by tumor cells. Although H&E-stained slides are routinely collected during cancer care, specialized IHC staining is typically required to ascertain antigen expression. Such stains are not always available or readily deployed. The need to perform separate IHC tests for each candidate ADC may burden clinical labs and can hinder access to care in resource-limited settings. Here we develop an ensemble of machine learning models to accurately predict the expression of 166 distinct ADC targets directly from H&E images. Methods: For each ADC-targeted gene, patients with copy-number amplifications (CNAs) were identified from somatic whole exome sequencing. Genes differentially expressed in patients with CNAs were identified from bulk transcriptomics. For each gene, an expression signature was developed based on the expression levels of differentially upregulated genes. Next, whole-slide, H&E-stained histopathology images were embedded into a lower-dimensional representation via a transformer model trained with self-supervised learning. Neural networks were developed to predict a patient’s probability of having a CNA in an ADC-targeted gene, as indicated by an expression signature exceeding the p90. All evaluation metrics were ascertained by 5-fold cross-validation, with training and evaluation on independent patients. Results: For each of the 166 ADC-targeted genes, a median of 154 patients were found to harbor a CNA, and the expression signature included a median of 180 genes. In all cases, patients with CNAs had significantly higher expression signatures than those without. For predicting likely CNA status directly from H&E histopathology images, the mean AUROC was 0.888 (95% CI, 0.876-0.900) and the mean AUPRC was 0.571 (95% CI, 0.531-0.611). Among the 166 ADC-targeted genes, the AUROC exceeded 0.95 for 31.3%, 0.8 for 80.7%, and 0.725 for 100%. The best-predicted ADC target was SLC7A5 (AUROC: 0.995 [95% CI, 0.994-0.998]; AUPRC: 0.967 [95% CI, 0.963-0.976]). Conclusion: We have developed models that accurately predict the likely expression of ADC targets based solely on H&E images. The ability to accurately discern the presence of ADC antigens from H&E images has numerous potential applications, including cohort refinement, computer-aided diagnosis, and personalized treatment planning. Citation Format: Zachary Ryan McCaw, Anna Shcherbina, Yajas Shah, insitro Research Team, Philip Tagari, Daphne Koller, Christopher Probert. Machine learning enables prediction of ADC targets from whole slide H&E images [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 6177.
Read full abstract