Abstract

While colorectal cancer (CRC) is third in prevalence and mortality among cancers in the United States, there is no effective method to screen the general public for CRC risk. In this study, to identify an effective mass screening method for CRC risk, we evaluated seven supervised machine learning algorithms: linear discriminant analysis, support vector machine, naive Bayes, decision tree, random forest, logistic regression, and artificial neural network. Models were trained and cross-tested with the National Health Interview Survey (NHIS) and the Prostate, Lung, Colorectal, Ovarian Cancer Screening (PLCO) datasets. Six imputation methods were used to handle missing data: mean, Gaussian, Lorentzian, one-hot encoding, Gaussian expectation-maximization, and listwise deletion. Among all of the model configurations and imputation method combinations, the artificial neural network with expectation-maximization imputation emerged as the best, having a concordance of 0.70 ± 0.02, sensitivity of 0.63 ± 0.06, and specificity of 0.82 ± 0.04. In stratifying CRC risk in the NHIS and PLCO datasets, only 2% of negative cases were misclassified as high risk and 6% of positive cases were misclassified as low risk. In modeling the CRC-free probability with Kaplan-Meier estimators, low-, medium-, and high CRC-risk groups have statistically-significant separation. Our results indicated that the trained artificial neural network can be used as an effective screening tool for early intervention and prevention of CRC in large populations.

Highlights

  • Of all new cancer incidences in the United States, 8.1% are colorectal cancer (CRC) (Falco et al, 2018; National Cancer Institute, 2018)

  • The standard deviation was formed from the variance from cross-testing between the National Health Interview Survey (NHIS) and PLCO datasets, and the variance from screened/unscreened subpopulations (Hanley and McNeil, 1982) using Equation (3)

  • Our logistic regression (LR) was our artificial neural network (ANN) with no hidden layers, suggesting the importance of interfactor coupling possibly corresponding to complications

Read more

Summary

Introduction

Of all new cancer incidences in the United States, 8.1% are colorectal cancer (CRC) (Falco et al, 2018; National Cancer Institute, 2018). The United States Preventative Services Task Force (USPSTF) recommends screening for individuals between ages 50 and 75 while the American Cancer Society recommends screening for individuals between ages 45 and 75 (Collins et al, 2015; Bibbins-Domingo et al, 2016). Both guidelines recommend screening for anyone with one or more primary relatives who have ever had CRC. While screening according to these guidelines indisputably saves lives, high-risk individuals with no CRC family history and/or aged 18–49 would clearly benefit from a model that better detects their risk. It is important to develop an effective method to estimate CRC risk non-invasively and cost-effectively

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.