6626 Background: Surgical resection (SR) is a guideline-recommended definitive therapy for patients (pts) with eNSCLC. However, 30–50% pts develop disease recurrence (DR) within the first 5 years (yrs) after surgery. ML applied to routinely collected EHR data could facilitate timely identification of pts at risk of DR who would benefit from enhanced surveillance or initial treatment (Tx) intensification. Methods: eNSCLC pts with stage IB-IIIA at the time of SR between Jan 2010 – February 2022 were identified from ConcertAI Patient360 oncology database comprising structured and curated records from US-based EHR. DR was defined as the human-curated earliest advanced/metastatic diagnosis date from SR. Gradient boosting (GB), random forest (RF), and logistic regression (LR) algorithms with 6-fold nested cross-validation were trained and compared to predict the risk of DR or death at 2 yrs from SR. Pts who did not experience DR and were lost to follow-up within 2 yrs from SR were removed. Feature importance was defined using Shapley Additive Explanation (SHAP)-derived odds ratios (OR). Due to low prevalence of DR at 2 yrs, AUPRC was used as performance metric. Pts in the 1st and 4th quartiles of predicted probabilities from XGB model were defined as low-risk vs. high-risk groups. Results: Among 3597 pts, median age was 68.3 yrs (IQR 12.5), 51.8% were female, and 8.6% were Black. 24.4% developed DR within 2 yrs. GB, RF, and LR models had similar ability to predict DR at 2 yrs with mean hold-out set AUPRC of 0.33. In GB model, DR prevalence at 2 yrs was 34% in the high-risk vs. 18% in the low-risk group in hold-out set. N0 stage, T stage < 3, receipt of adjuvant immune checkpoint inhibitor (ICI), and presence of EGFR mutations were protective against DR at 2 yrs, while history of anemia and congestive heart failure (CHF) were risk factors (Table). SHAP revealed a N stage-by-adjuvant therapy and a N stage-by-CHF interaction: benefit from adjuvant chemotherapy (CT) was limited to higher N stages, while adjuvant ICI was beneficial for all N stages. CHF was a risk factor for lower N stages but not for higher N stages. Conclusions: ML applied to structured and unstructuredEHRs identified predictors of risk of DR at 2 yrs after surgery in eNSCLC and may identify high-risk pts who would benefit from enhanced surveillance plans and Tx intensification in the eNSCLC setting. The model indicated favorable outcome from adjuvant targeted therapies in EGFR mutated pts. [Table: see text]
Read full abstract