Abstract Background: Antibody-drug conjugates (ADCs) targeting trophoblast cell-surface antigen 2 (TROP-2) and hepatocyte growth factor receptor (cMET) are promising therapies for non-small cell lung cancer (NSCLC). However, their clinical application requires robust and rapid biomarker evaluation that addresses expression heterogeneity and avoid interobserver variability. Current approaches based on pathologist assessments are limited by subjectivity and scalability. This study aimed to develop a generalizable AI model for ADC biomarker evaluation, trained on TROP-2 and inferred on cMET, to validate its adaptability across markers. Additionally, the model’s performance was compared with expert pathologists to assess its clinical utility. Finally, biomarker prevalence in the two main NSCLC subtypes, namely adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC), was evaluated. Methods: We collected a bicentric real-world sample group of 1142 patients with resected NSCLC from the Charité, Berlin and the University Hospital Cologne. For tissue microarray construction, two 1.5-mm tissue cores were punched from formalin-fixed and paraffin-embedded tumor blocks. Immunohistochemical staining for TROP-2 and cMET was performed and sections were scanned for AI-based analysis. The AI pipeline comprised three models: a cell detection model for the identification of cells, a cell classification model for the differentiation of tumor and other cells, and an expression scoring model for membranous biomarker quantification. The model was trained on TROP-2, and subsequently inferred on cMET, thus enabling an evaluation of cross-marker generalization. Five pathologists with varying levels of expertise manually evaluated a representative subset, using H-scoring. Finally, the results were compared with those yielded by the AI model. Results: The expression scoring model achieved a macro-averaged F1 score of 94% for TROP-2 and 91% for cMET. Moreover, the model demonstrated excellent concordance with expert pathologists (TROP-2: 93%; cMET: 95% average pair-wise Pearson correlation). TROP-2 overexpression was significantly higher in LUSC (mean H-score: 154.67) than LUAD (mean H-score: 86.57), while cMET showed the opposite trend (mean H-score LUAD: 59.52; LUSC: 25.68). Conclusion: This study highlights the potential of AI models to address key challenges in ADC biomarker evaluation, including expression heterogeneity, interobserver variability, and reduction in time expenditure. By successfully generalizing between TROP-2 and cMET, the model demonstrates adaptability and scalability for broader clinical applications. These findings pave the way for integrating AI into clinical workflows, improving patient stratification, and optimizing ADC therapy selection. Future efforts will focus on expanding this approach to additional biomarkers and validating its utility in prospective clinical trials. Citation Format: Philipp Anders, Philipp Erwin Seegerer, Katja Lingelbach, Suhas Pandhe, Sandip Ghosh, Cornelius Böhm, Stephan Tietz, Rosemarie Krupar, Lars Tharun, Marie-Lisa Eich, Julika Ribbat-Idel, Verena Aumiller, Sabine Merkelbach-Bruse, Alexander Quaas, Nikolaj Frost, Georg Schlachtenberger, Matthias Heldwein, Ulrich Keilholz, Khosro Hekmat, Jens-Carsten Rückert, Reinhard Büttner, David Horst, Maximilian Alber, Lukas Ruff, Frederick Klauschen, Gabriel Dernbach, Simon Schallenberg. From bench to bedside: generalizable AI model for ADC biomarker evaluation in NSCLC [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 3351.
Read full abstract