Abstract Objective: DNA amplifications of the cMET locus (METamp) are rare (1-6%) in non-small cell lung cancer (NSCLC). The ability to efficiently identify patients with METamp could accelerate development of cMET-targeting therapies by identifying patients for enrollment efficiently. We developed a self-supervised deep learning algorithm (SSDL) to predict METamp status using H&E stained whole slide images (WSI). Methods: Three datasets (CDtrain, CDNGS, CDFISH) consisting of data from 1,436 advanced NSCLC patients (1 slide/pt) were used to develop an SSDL algorithm to predict METamp status. The METamp+ status across datasets was standardized for CDtrain (n=788, METamp+ Prevalence=15%), CDNGS (n=186, 19 positive, 167 negative, Prev=10%) as harboring ≥5 copies of the gene using NGS(Next Generation Sequencing) and for CDFISH (n=462, 36 positive, 426 negative, Prev=8%) as MET CN gain to centromere 7 ratio (CEN) ≥2.5. The datasets were divided into a discovery set (CDtrain) for training and held-out test sets (CDNGS, CDFISH) for validation. To improve robustness of SSDL, we first pre-trained ResNet34 with Simple Contrastive Learning, using >25k WSIs from diverse sources. During the training phase, 4-fold cross validation was applied on CDtrain. The first 3-layers of ResNet34 were frozen while the final layer and an attention-based aggregation system were trained for METamp prediction. The performance was assessed using sensitivity, specificity, AUC, positive predictive value (PPV), and negative predictive value (NPV) on the test sets. Predictions from the model were binarized using two separate thresholds selected from CDtrain, resulting in a High-Sensitivity model (SSDLSens = 90% Sensitivity) and High-Specificity model (SSDLSpec = 90% Specificity). Each model’s potential impact on patient enrollment is presented below. Results: The trained SSDL models achieved an average cross-validation AUC of 0.77±0.04 across folds. The final trained model with the highest cross-validated AUC was chosen for METamp status prediction on the test sets. The selected model maintained predictive performance on CDNGS (AUC=0.82) and CDFISH (AUC=0.75) test sets. The SSDLSpec model had a PPV more than twice the baseline prevalence in both CDNGS (PPV=27%), and CDFISH (PPV=29%), providing a significant enrichment in METamp+ patients. The SSDLSens model correctly classified ~55% of METamp‒ in CDNGS and CDFISH with NPV of ≥97%. These results suggest a potential reduction in genetic screening by 50% while maintaining high sensitivity. Conclusions: The SSDL model robustly predicted METamp status in independent patient cohorts, regardless of METamp definitions (i.e., by NGS or FISH). We demonstrate the ability to classify patients with high sensitivity or specificity, which can help reduce genetic testing or surface potentially eligible patients for trials, respectively. Citation Format: Chaitanya Parmar, Oscar M. Carrasco-Zevallos, Joshua C. Curtin, Darshana Govind, Bing Xia, S. Martin Shreeve, Songbai Wang, Timothy Jatkoe, Patricia Raciti, Levon Demirdjian, Joel Greshock, Kristopher Standish, Stephen S.F. Yip. Prediction of MET amplification from H&E images in non small cell lung cancer using self-supervised deep learning and its role in clinical trial enrollment [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 6554.
Read full abstract