Abstract

The incidence of colorectal cancer (CRC) is increasing in adults younger than 50, and early screening remains challenging due to cost and under-utilization. To identify individuals aged 35-50 years who may benefit from early screening, we developed a prediction model using machine learning and electronic health record (EHR)-derived factors. We enrolled 3,116 adults aged 35-50 at average-risk for CRC and underwent colonoscopy between 2017-2020 at a single center. Prediction outcomes were (1) CRC and (2) CRC or high-risk polyps. We derived our predictors from EHRs (e.g., demographics, obesity, laboratory values, medications, and zip code-derived factors). We constructed four machine learning-based models using a training set (random sample of 70% of participants): regularized discriminant analysis, random forest, neural network, and gradient boosting decision tree. In the testing set (remaining 30% of participants), we measured predictive performance by comparing C-statistics to a reference model (logistic regression). The study sample was 55.1% female, 32.8% non-white, and included 16 (0.05%) CRC cases and 478 (15.3%) cases of CRC or high-risk polyps. All machine learning models predicted CRC with higher discriminative ability compared to the reference model [e.g., C-statistics (95%CI); neural network: 0.75 (0.48-1.00) vs. reference: 0.43 (0.18-0.67); P = 0.07] Furthermore, all machine learning approaches, except for gradient boosting, predicted CRC or high-risk polyps significantly better than the reference model [e.g., C-statistics (95%CI); regularized discriminant analysis: 0.64 (0.59-0.69) vs. reference: 0.55 (0.50-0.59); P<0.0015]. The most important predictive variables in the regularized discriminant analysis model for CRC or high-risk polyps were income per zip code, the colonoscopy indication, and body mass index quartiles. Machine learning can predict CRC risk in adults aged 35-50 using EHR with improved discrimination. Further development of our model is needed, followed by validation in a primary-care setting, before clinical application.

Highlights

  • Colorectal cancer (CRC) is the most gastrointestinal cancer, affecting over 150,000 adults in the U.S each year

  • The present study builds on and extends these reports by demonstrating the superior ability of modern machine learning approaches to predict colorectal cancer (CRC) (AUC of 0.75 for the neural network model) and CRC or high-risk polyps (AUC of 0.64 for the regularized discriminant analysis model) compared to conventional logistic regression using variables routinely available in electronic health record (EHR). When it comes to practical applications, and based on the improved positive prediction of CRC or high-risk serrated or adenomatous polyps (21.9% in the neural network model compared to conventional regression) observed in our study, we suggest that combination of machine learning-based risk assessment and fecal immunochemical testing (FIT) could offer a cost-effective early screening strategy for adults under the age of 50 and would help to reduce the burden of colonoscopy referrals on the healthcare system

  • We did not observe an improvement in the predictive power of any of the models. In this analysis of data routinely collected in EHRs for clinical purposes, we demonstrated that machine learning has a superior ability to predict the risk of colorectal neoplasia in adults aged 35–50 compared to conventional logistic regression

Read more

Summary

Introduction

Colorectal cancer (CRC) is the most gastrointestinal cancer, affecting over 150,000 adults in the U.S each year. Despite a declining CRC incidence and mortality in older adults due to effective screening, CRC incidence and mortality is rising in adults 50 years of age [1–3]. There is an urgent need to establish novel and targeted CRC screening strategies for young adults that are cost-effective and easy to implement. Such efforts are challenged by the perceived lower risk among young adults and medical providers, even when gastrointestinal symptoms are present [7–9]. The incidence of colorectal cancer (CRC) is increasing in adults younger than 50, and early screening remains challenging due to cost and under-utilization. To identify individuals aged 35–50 years who may benefit from early screening, we developed a prediction model using machine learning and electronic health record (EHR)-derived factors

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.