Abstract

Background: South Africa (SA) has the highest incidence of colorectal cancer (CRC) in Sub-Saharan Africa (SSA). However, there is limited research on CRC recurrence and survival in SA. CRC recurrence and overall survival are highly variable across studies. Accurate prediction of patients at risk can enhance clinical expectations and decisions within the South African CRC patients population. We explored the feasibility of integrating statistical and machine learning (ML) algorithms to achieve higher predictive performance and interpretability in findings.Methods: We selected and compared six algorithms:- logistic regression (LR), naïve Bayes (NB), C5.0, random forest (RF), support vector machine (SVM) and artificial neural network (ANN). Commonly selected features based on OneR and information gain, within 10-fold cross-validation, were used for model development. The validity and stability of the predictive models were further assessed using simulated datasets.Results: The six algorithms achieved high discriminative accuracies (AUC-ROC). ANN achieved the highest AUC-ROC for recurrence (87.0%) and survival (82.0%), and other models showed comparable performance with ANN. We observed no statistical difference in the performance of the models. Features including radiological stage and patient's age, histology, and race are risk factors of CRC recurrence and patient survival, respectively.Conclusions: Based on other studies and what is known in the field, we have affirmed important predictive factors for recurrence and survival using rigorous procedures. Outcomes of this study can be generalised to CRC patient population elsewhere in SA and other SSA countries with similar patient profiles.

Highlights

  • Colorectal cancer (CRC) is the third most common cancer, and the fourth cause of cancer-related death [1]

  • The Colorectal Cancer in South Africa (CRCSA) study was the first prospective study designed to describe the clinical presentation, demographics, risk factors, treatment, and outcomes according to population group, from both private and state health–care facilities in Johannesburg, SA [2]

  • Charlotte Maxeke Johannesburg Academic Hospital (CMJAH), Chris Hani Baragwanath Academic Hospital (CHBAH), Wits Donald Gordon Medical Centre (WDGMC), and Edenvale Hospital that serve as private and public hospitals to many urban dwellers in the Johannesburg metropole were used as the study sites

Read more

Summary

Introduction

Colorectal cancer (CRC) is the third most common cancer, and the fourth cause of cancer-related death [1]. The CRC incidence significantly varies, with high-income countries having a higher risk of CRC than low-middle-income countries (LMICs). This may not be the true reflection of the burden of cancer in LMICs due to the lack of cancer registries in most LMICs [2]. South Africa (SA) has the highest incidence of CRC in sub-Saharan Africa, and CRC is among the most commonly diagnosed cancer in South African men, and women [4]. South Africa (SA) has the highest incidence of colorectal cancer (CRC) in Sub-Saharan Africa (SSA). Accurate prediction of patients at risk can enhance clinical expectations and decisions within the South African CRC patients population. We explored the feasibility of integrating statistical and machine learning (ML) algorithms to achieve higher predictive performance and interpretability in findings

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.