Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database

Mi Du,Murthy N Mittinty,Dandara G Haag,John W Lynch

doi:10.3390/cancers12102802

Mi Du, Murthy N Mittinty + Show 2 more

Open Access

PDF Available

https://doi.org/10.3390/cancers12102802

Copy DOI

Export

Save

Cite

Journal: Cancers	Publication Date: Sep 29, 2020
Citations: 45	License type: CC BY 4.0

Affiliation: University of Adelaide, University of Bristol

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Simple SummaryFormulating accurate survival prediction models of oral and pharyngeal cancers (OPCs) is important, as they might impact the decisions of clinicians and patients. Improving the quality of these clinical prediction modelling studies can benefit the reliability of the developed models and facilitate their implementations in clinical practice. Given the growing trend on the application of machine learning methods in cancer research, we present the use of popular tree-based machine learning algorithms and compare them to the standard Cox regression as an aim to predict OPCs survival. The predictive models discussed here are based on a large cancer registry dataset incorporating various prognosis factors and different forms of bias. The comparable predictive performance between Cox and tree-based models suggested that these machine learning algorithms provide non-parametric alternatives to Cox regression and are of clinical use for estimating the survival probability of OPCs patients.This study aims to demonstrate the use of the tree-based machine learning algorithms to predict the 3- and 5-year disease-specific survival of oral and pharyngeal cancers (OPCs) and compare their performance with the traditional Cox regression. A total of 21,154 individuals diagnosed with OPCs between 2004 and 2009 were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. Three tree-based machine learning algorithms (survival tree (ST), random forest (RF) and conditional inference forest (CF)), together with a reference technique (Cox proportional hazard models (Cox)), were used to develop the survival prediction models. To handle the missing values in predictors, we applied the substantive model compatible version of the fully conditional specification imputation approach to the Cox model, whereas we used RF to impute missing data for the ST, RF and CF models. For internal validation, we used 10-fold cross-validation with 50 iterations in the model development datasets. Following this, model performance was evaluated using the C-index, integrated Brier score (IBS) and calibration curves in the test datasets. For predicting the 3-year survival of OPCs with the complete cases, the C-index in the development sets were 0.77 (0.77, 0.77), 0.70 (0.70, 0.70), 0.83 (0.83, 0.84) and 0.83 (0.83, 0.86) for Cox, ST, RF and CF, respectively. Similar results were observed in the 5-year survival prediction models, with C-index for Cox, ST, RF and CF being 0.76 (0.76, 0.76), 0.69 (0.69, 0.70), 0.83 (0.83, 0.83) and 0.85 (0.84, 0.86), respectively, in development datasets. The prediction error curves based on IBS showed a similar pattern for these models. The predictive performance remained unchanged in the analyses with imputed data. Additionally, a free web-based calculator was developed for potential clinical use. In conclusion, compared to Cox regression, ST had a lower and RF and CF had a higher predictive accuracy in predicting the 3- and 5-year OPCs survival using SEER data. The RF and CF algorithms provide non-parametric alternatives to Cox regression to be of clinical use for estimating the survival probability of OPCs patients.

Highlights

Oral and pharyngeal cancers (OPCs) are ranked as the ninth most prevalent type of cancers [1]
We found that all models performed better than the default benchmark Kaplan
In attempt to contribute to clinical decision-making, we have developed a web-based oral and pharyngeal cancers (OPCs) survival probability calculator based on a Cox regression model

Summary

Introduction

Oral and pharyngeal cancers (OPCs) are ranked as the ninth most prevalent type of cancers [1]. In response to the need for improving medical care delivery in the oral health field, there are clinical decision support tools being developed to aid the early detection, diagnosis, treatment and prognosis of oral diseases, including. Following the up-to-date bias assessment criteria (PROBAST-Prediction model Risk Of Bias ASsessment Tool) [6] and reporting guidelines (TRIPOD-Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis) [7], the overall quality of oral health prediction modelling studies was found to be less than optimal due to the presence of multiple sources of bias (e.g., measurement error, unmeasured predictors) and lack of reporting transparency [8]

Objectives

Methods

Discussion

Conclusion