Abstract
Risk assessment is of paramount importance for the detection and treatment of colorectal cancer. We developed and validated a feature interpretability screening framework to identify high-risk populations and recommend colonoscopy for them. We utilized a training cohort consisting of 1252605 participants who underwent colonoscopies in Shanghai from 2013 to 2015 to develop the screening framework. We incorporated Shapley additive explanation values into feature selection to provide interpretability for the framework. Two sampling methods were separately employed to mitigate potential model bias caused by class imbalance. Furthermore, we employed various machine learning algorithms to construct risk assessment models and compared their performance. We tested the screening models on an external validation cohort of 359462 samples and conducted comprehensive evaluation and statistical analysis of the validation results. The external validation results demonstrated that the models in the proposed framework achieved sensitivity over 0.734, specificity over 0.790, and area under the receiver operating characteristic curve ranging from 0.808 to 0.859. In the predictions of the best-performing model, the prevalence rates of colorectal cancer were 0.059% and 1.056% in the low- and high-risk groups, respectively. If colonoscopies were performed only on the high-risk group predicted by the model, only 14.36% of total colonoscopies would be needed to detect 74.86% of colorectal cancer cases. We developed and validated a novel framework to identify populations at high risk for colorectal cancer. Those classified as high risk should undergo colonoscopy for further diagnosis.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have