Literature-based Prediction of Radiation Pneumonitis in Patients with Non–Small-Cell Lung Cancer

E Ju,S Lee,K.H Kim,H.J Lee,K.H Chang,Y.J Cao,J.B Shim,N.K Lee,D.S Yang,W.S Yoon,Y.J Park,C.Y Kim

doi:10.1016/j.ijrobp.2019.06.1380

Abstract

Given the lack of retrospectively analyzed data, there are limits to collecting learning data of high quality. This limitation might be overcome by data mining the clinical literature. The aim of this study is to predict radiation pneumonitis(RP) in patients with non-small-cell lung cancer(NSCLC) before radiotherapy using the machine learning algorithm based on the previously published literature in the world. Fifty literatures related to radiation pneumonitis were structured through semantic data mining using Konan Analytics program. The target variable was set to RP grade 0 to 5 according to the National Cancer Institute Common Toxicity Criteria version 3.0. The predictor variables were set to 10 factors (Interstitial lung disease, Chronic obstructive pulmonary disease, Pulmonary function, Age, Concurrent chemotherapy, Tumor location, Mean lung dose, V15, V20, V30). To predict RP, support vector regression algorithm was implemented as a machine learning algorithm. The accuracy of the regression model was expressed in the form of root mean square error(RMSE) comparing the difference between the predicted value and the actual value. In order to evaluate the prediction results using literature, the prediction results using the actual 110 patients’ data was compared. Total 39,404 cases in patients with NSCLC were generated by semantic data mining using literature. The results of the prediction using literature showed RMSE of 1.307. The difference of RMSE cost in literature- based prediction (RMSE=1.307) and patient data based prediction (RMSE=1.056) was 0.061. It was confirmed that literature data can be used to predict radiation toxicity before radiotherapy for patient personalized treatment since there is no significant difference between the prediction model using literature data and patient data.

Full Text