Radiation esophagitis is a common adverse event that may occur during chemoradiotherapy (CRT) that can adversely affect survival. This study aimed to develop interpretable machine learning (ML) models to predict grade 3 and higher radiation esophagitis in patients receiving definitive CRT therapy for locally advanced non-small cell lung cancer (LA-NSCLC). A total of 335 patients with LA-NSCLC who received definitive concurrent CRT at a single institution from 2017 to 2021 were retrospectively identified. Patients with esophagitis were identified and graded according to CTCAE v5.0. For each patient, 31 clinical features and 1093 dose-volume histogram (DVH) parameters from 19 structures were collected. The data was then randomly split into training (n = 233) and testing (n = 102) datasets. Feature selection was performed on the training dataset using the minimum redundancy maximum relevance algorithm to find a set of relevant features while controlling for the redundancy within the selected features, which were then followed by the Boruta algorithm to remove unimportant features and make the ML model more accurate. Synthetic minority oversampling technique was used to handle class-imbalanced datasets by generating synthetic samples for the minority class. Four variants of the Generalized Additive Model (GAM), including Explainable Boosting Machine (EBM), neural GAM (NODE-GAM), eXtreme Gradient Boosting (XGB)-GAM, and Spline, were built with selected features. The models' performance in predicting esophagitis was evaluated using the area under the receiver operating characteristic curve (AUC) in the test dataset. Shape plots were used to interpret the models' output and explain the selected features' contribution to the prediction. NODE-GAM yielded the highest performance (F1 score = 0.57, accuracy = 0.8, and AUC = 0.837), followed by EBM (F1 score = 0.43, accuracy = 0.8, and AUC = 0.7), Spline (F1 score = 0.42, accuracy = 0.74, and AUC = 0.737), and XGB-GAM (F1 score = 0.42, accuracy = 0.76, and AUC = 0.71). Selected features included D95%[Gy], D90%[Gy], D65%[Gy] and V40Gy [%] for the esophagus, V10Gy [%] for the pulmonary artery, and the distance from GTVn to the ascending aorta. The analysis of the selected features indicated that an increased radiation dose delivered to the esophagus and a shorter distance between the ascending aorta and GTVn were associated with a higher risk of developing esophagitis. Our study demonstrates the feasibility of developing interpretable ML models to predict esophagitis in patients with LA-NSCLC patients treated with CRT. NODE-GAM provided the best accuracy while providing insights into the driving dosimetric factors that could be used to guide optimal RT planning.
Read full abstract