The aim of our study was to assess the overall survival rates for colorectal cancer at 3 years and to identify associated strong prognostic factors among patients in Morocco through an interpretable machine learning approach. This approach is based on a fully non-parametric survival random forest (RSF), incorporating variable importance and partial dependence effects. The data was povided from a retrospective study of 343 patients diagnosed and followed at Hassan II University Hospital. Covariate selection was performed using the variable importance based on permutation and partial dependence plots were displayed to explore in depth the relationship between the estimated partial effect of a given predictor and survival rates. The predictive performance was measured by two metrics, the Concordance Index (C-index) and the Brier Score (BS). Overall survival rates at 1, 2 and 3 years were, respectively, 87% (SE = 0.02; CI-95% 0.84–0.91), 77% (SE = 0.02; CI-95% 0.73–0.82) and 60% (SE = 0.03; CI-95% 0.54–0.66). In the Cox model after adjustment for all covariates, sex, tumor differentiation had no significant effect on prognosis, but rather tumor site had a significant effect. The variable importance obtained from RSF strengthens that surgery, stage, insurance, residency, and age were the most important prognostic factors. The discriminative capacity of the Cox PH and RSF was, respectively, 0.771 and 0.798 for the C-index while the accuracy of the Cox PH and RSF was, respectively, 0.257 and 0.207 for the BS. This shows that RSF had both better discriminative capacity and predictive accuracy. Our results show that patients who are older than 70, living in rural areas, without health insurance, at a distant stage and who have not had surgery constitute a subgroup of patients with poor prognosis.
Read full abstract