The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study.

Jialong Xiao,Miao Mo,Ying Zheng,Jie Shen,Zezhou Wang,Yulian He,Jing Yuan,Changming Zhou

doi:10.2196/33440

Jialong Xiao, Miao Mo + Show 6 more

Open Access

https://doi.org/10.2196/33440

Copy DOI

Abstract

BackgroundOver the recent years, machine learning methods have been increasingly explored in cancer prognosis because of the appearance of improved machine learning algorithms. These algorithms can use censored data for modeling, such as support vector machines for survival analysis and random survival forest (RSF). However, it is still debated whether traditional (Cox proportional hazard regression) or machine learning-based prognostic models have better predictive performance.ObjectiveThis study aimed to compare the performance of breast cancer prognostic prediction models based on machine learning and Cox regression.MethodsThis retrospective cohort study included all patients diagnosed with breast cancer and subsequently hospitalized in Fudan University Shanghai Cancer Center between January 1, 2008, and December 31, 2016. After all exclusions, a total of 22,176 cases with 21 features were eligible for model development. The data set was randomly split into a training set (15,523 cases, 70%) and a test set (6653 cases, 30%) for developing 4 models and predicting the overall survival of patients diagnosed with breast cancer. The discriminative ability of models was evaluated by the concordance index (C-index), the time-dependent area under the curve, and D-index; the calibration ability of models was evaluated by the Brier score.ResultsThe RSF model revealed the best discriminative performance among the 4 models with 3-year, 5-year, and 10-year time-dependent area under the curve of 0.857, 0.838, and 0.781, a D-index of 7.643 (95% CI 6.542, 8.930) and a C-index of 0.827 (95% CI 0.809, 0.845). The statistical difference of the C-index was tested, and the RSF model significantly outperformed the Cox-EN (elastic net) model (C-index 0.816, 95% CI 0.796, 0.836; P=.01), the Cox model (C-index 0.814, 95% CI 0.794, 0.835; P=.003), and the support vector machine model (C-index 0.812, 95% CI 0.793, 0.832; P<.001). The 4 models’ 3-year, 5-year, and 10-year Brier scores were very close, ranging from 0.027 to 0.094 and less than 0.1, which meant all models had good calibration. In the context of feature importance, elastic net and RSF both indicated that TNM staging, neoadjuvant therapy, number of lymph node metastases, age, and tumor diameter were the top 5 important features for predicting the prognosis of breast cancer. A final online tool was developed to predict the overall survival of patients with breast cancer.ConclusionsThe RSF model slightly outperformed the other models on discriminative ability, revealing the potential of the RSF method as an effective approach to building prognostic prediction models in the context of survival analysis.

Highlights

Breast cancer is a leading cause of morbidity and mortality in women worldwide, and the prediction of breast cancer prognosis is crucial for decision-making
The statistical difference of the concordance index (C-index) was tested, and the random survival forest (RSF) model significantly outperformed the Cox-Elastic Net (EN) model (C-index 0.816, 95% CI 0.796, 0.836; P=.01), the Cox model (C-index 0.814, 95% CI 0.794, 0.835; P=.003), and the support vector machine model (C-index 0.812, 95% CI 0.793, 0.832; P
In the context of feature importance, elastic net and RSF both indicated that TNM staging, neoadjuvant therapy, number of lymph node metastases, age, and tumor diameter were the top 5 important features for predicting the prognosis of breast cancer

Summary

Introduction

Breast cancer is a leading cause of morbidity and mortality in women worldwide, and the prediction of breast cancer prognosis is crucial for decision-making. Online are 2 famous prognostic prediction tools for breast cancer based on clinical and pathological characteristics [1,2] These models have been validated by external data set and are commonly used in the United States and Western Europe. Machine learning methods have been increasingly explored in cancer prognosis because of the appearance of improved machine learning algorithms These algorithms can use censored data for modeling, such as support vector machines for survival analysis and random survival forest (RSF). It is still debated whether traditional (Cox proportional hazard regression) or machine learning-based prognostic models have better predictive performance

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JMIR Medical Informatics	Publication Date: Feb 18, 2022
Citations: 21	License type: cc-by

R Discovery Prime

R Discovery Prime

The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR Medical Informatics

Lead the way for us

Similar Papers

A radiomics-based model can predict recurrence-free survival of hepatocellular carcinoma after curative ablation
Wei Peng ... Ling Zhang
Asian Journal of Surgery | VOL. 46
Wei Peng, et. al.Wei Peng ... Ling Zhang
07 Nov 2022
Asian Journal of Surgery | VOL. 46

Prognosis prediction of extremity and trunk wall soft-tissue sarcomas treated with surgical resection with radiomic analysis based on random survival forest.
Yuhan Yang ... Yixi Wang
Updates in Surgery | VOL. 74
Yuhan Yang, et. al.Yuhan Yang ... Yixi Wang
18 May 2021
Updates in Surgery | VOL. 74

MRI-based random survival Forest model improves prediction of progression-free survival to induction chemotherapy plus concurrent Chemoradiotherapy in Locoregionally Advanced nasopharyngeal carcinoma
Wei Pei ... Yunyun Wei
BMC Cancer | VOL. 22
Wei Pei, et. al.Wei Pei ... Yunyun Wei
06 Jul 2022
BMC Cancer | VOL. 22

Machine Learning–Based Prognostic Model for Patients After Lung Transplantation
Dong Tian ... Yu-Jie Zuo
JAMA Network Open | VOL. 6
Dong Tian, et. al.Dong Tian ... Yu-Jie Zuo
05 May 2023
JAMA Network Open | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR Medical Informatics