Abstract

e13566 Background: Immune checkpoint inhibitors (ICI) have improved outcomes in tumor types allowing subgroups of patients to have longer, higher quality lives. However, potential life-threatening immunotoxicities can arise in susceptible patients, including pneumonitis. Identifying patients at high risk of immunotoxicity can help patients understand potential adverse events, improve clinical trial cohort selection, and inform therapy selection in clinical settings. Here, we use electronic health record (EHR) data to build a binary classification model that predicts the probability of developing pneumonitis after the first ICI administration. Methods: We utilized real-world EHR-derived structured and unstructured data from > 2,700 patients from Vanderbilt University Medical Center obtained prior to December 31, 2018. Unstructured data were transformed into structured variables by expert curators, including labels for pneumonitis episodes following ICI initiation. Feature engineering involved aggregating lab measurements over a 60-day time window before the first ICI; other features (conditions, smoking status, etc.) used a 1-year window. To build a small, easily deployable model and assess its performance robustly, we utilized a sequential process. In each step, we decided between two versions of a random forest model, one with the original feature set (M1) and one extended with a candidate feature (M2). We identified candidate features using 90% of the data. We performed nested cross-validation on this partition and compared the inner loop results. If M2 was significantly better, we tested whether it performed better on the 10% partition. If it did, we chose M2 and assessed its performance on the outer loop. This procedure was created as our dataset was rather small and noisy, which is typical for EHR-derived data. Results: All-cause pneumonitis incidence following ICI initiation was 8.4%. Our final model includes only six features: frequency of lung-related ICD-10 codes, frequency of C34 code, frequency of C78 code, smoking status, interaction between smoking and C34/C78 indicators, and median of blood oxygen saturation. This model achieved a mean AUC of 0.66 (SD: 0.07). Our analysis on the outer loop predictions showed that selecting 50% of patients with the lowest predicted probabilities reduced the occurrence of pneumonitis in the cohort to 5%, compared to 8.4%, when we select patients randomly. The model achieved a mean positive predictive value of 0.3 and negative predictive value of 0.96. Conclusions: We utilized a real-world EHR dataset to identify patterns in patient medical history that could predict the development of pneumonitis. We demonstrated that a small number of easily obtainable clinical covariates can result in meaningful predictions. This model illustrates potential future use for identifying the patients with the highest and lowest risks for pneumonitis during treatment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call