Evaluating generalizability of practice-changing randomized clinical trials in non-small cell lung cancer using machine learning-based in-silico trials.

Xavier Orcutt,Aaron B Cohen,Ravi Bharat Parikh,Ronac Mamtani,Arjun Sondhi

doi:10.1200/jco.2023.41.16_suppl.9130

Abstract

9130 Background: Results of randomized clinical trials (RCTs) of anticancer agents are not generalizable to many real-world patients. Advances in machine learning (ML) and increasing availability of curated real-world data offer opportunities to assess generalizability by simulating trials “in-silico”. Our objective was to assess the generalizability of survival outcomes reported in 2 practice-changing phase III trials in first-line (1L) advanced non-small cell lung cancer (aNSCLC). Methods: Our cohort included patients from the nationwide Flatiron Health EHR-derived de-identified database diagnosed with stage IIIB-IV or recurrent aNSCLC between 2011 and 2020. First, we trained and validated supervised ML models (gradient boosted, random forest, support vector machine and penalized Cox) to predict 1-year survival for patients with aNSCLC; the Lung Cancer Prognostic Index (LCPI), a published disease-specific prognostic index, was used as a comparator model. We used 130 demographic, vital sign, laboratory, and biomarker features at aNSCLC diagnosis to build models. Second, we used the best-performing ML model to create 4 prognostic risk groups. Third, we simulated 2 seminal trials for 1L treatment of aNSCLC, using inverse probability of treatment-weighted survival analyses, coarsely reproducing inclusion/exclusion criteria across ML-derived risk groups. We compared median overall survival (mOS) using Kaplan-Meier curves from the start of 1L treatment to death in in-silico trials (ISTs) vs. RCTs. Results: Our cohort included 61,339 patients with aNSCLC. The best-performing gradient boosted model outperformed the LCPI (AUC 0.784 vs 0.688). In ISTs, survival benefits of novel treatments varied across risk groups and were generally lower in ISTs compared to RCTs (Table). IST results in high- and very high-risk patients were inconsistent with RCT survival results. For example, for KEYNOTE-024, mOS in the pembrolizumab arm was 30.0 months, whereas in the IST, mOS among patients receiving pembrolizumab varied from 1.3 months in very high-risk patients to 41.5 months in low-risk patients. RCT results overestimated treatment effects for high-risk aNSCLC patients. Conclusions: ML-based ISTs can reveal heterogeneity in real-world survival outcomes associated with novel oncology treatments and elucidate populations for whom RCT results generalize poorly. [Table: see text]

Full Text