Machine learning algorithms are comparable to conventional regression models in predicting distant metastasis of follicular thyroid carcinoma.

Yaqian Mao,Huibin Huang,Jixing Liang,Gang Chen,Wei Lin,Liantao Li,Junping Wen,Huiyu Lan

doi:10.1111/cen.14693

Abstract

Distant metastasis often indicates a poor prognosis, so early screening and diagnosis play a significant role. Our study aims to construct and verify a predictive model based on machine learning (ML) algorithms that can estimate the risk of distant metastasis of newly diagnosed follicular thyroid carcinoma (FTC). This was a retrospective study based on the Surveillance, Epidemiology, and End Results (SEER) database from 2004 to 2015. A total of 5809 FTC patients were included in the data analysis. Among them, there were 214 (3.68%) cases with distant metastasis. Univariate and multivariate logistic regression (LR) analyses were used to determine independent risk factors. Seven commonly used ML algorithms were applied for predictive model construction. We used the area under the receiver-operating characteristic (AUROC) curve to select the best ML algorithm. The optimal model was trained through 10-fold cross-validation and visualized by SHapley Additive exPlanations (SHAP). Finally, we compared it with the traditional LR method. In terms of predicting distant metastasis, the AUROCs of the seven ML algorithms were 0.746-0.836 in the test set. Among them, the Extreme Gradient Boosting (XGBoost) had the best prediction performance, with an AUROC of 0.836 (95% confidence interval [CI]: 0.775-0.897). After 10-fold cross-validation, its predictive power could reach the best [AUROC: 0.855(95% CI: 0.803-0.906)], which was slightly higher than the classic binary LR model [AUROC: 0.845 (95% CI: 0.818-0.873)]. The XGBoost approach was comparable to the conventional LR method for predicting the risk of distant metastasis for FTC.

Full Text