Colorectal cancer (CRC) is a heterogeneous group of malignancies distinguished by distinct clinical features. The association of these features with venous thromboembolism (VTE) is yet to be clarified. Machine learning (ML) models are well suited to improve VTE prediction in CRC due to their ability to receive the characteristics of a large number of features and understand the dataset to obtain implicit correlations. Data were extracted from 4,914 patients with colorectal cancer between August 2019 and August 2022, and 1,191 patients who underwent surgery on the primary tumor site with curative intent were included. The variables analyzed included patient-level factors, cancer-level factors, and laboratory test results. Model training was conducted on 30% of the dataset using a ten-fold cross-validation method and model validation was performed using the total dataset. The primary outcome was VTE occurrence in postoperative 30 days. Six ML algorithms, including logistic regression (LR), random forest (RF), extreme gradient boosting (XGBoost), weighted support vector machine (SVM), a multilayer perception (MLP) network, and a long short-term memory (LSTM) network, were applied for model fitting. The model evaluation was based on six indicators, including receiver operating characteristic curve-area under the curve (ROC-AUC), sensitivity (SEN), specificity (SPE), positive predictive value (PPV), negative predictive value (NPV), and Brier score. Two previous VTE models (Caprini and Khorana) were used as the benchmarks. The incidence of postoperative VTE was 10.8%. The top ten significant predictors included lymph node metastasis, C-reactive protein, tumor grade, anemia, primary tumor location, sex, age, D-dimer level, thrombin time, and tumor stage. In our results, the XGBoost model showed the best performance, with a ROC-AUC of 0.990, a SEN of 96.9%, a SPE of 96.1% in training dataset and a ROC-AUC of 0.908, a SEN of 77.5%, a SPE of 93.7% in validation dataset. All ML models outperformed the previously developed models (Caprini and Khorana). This study developed postoperative VTE predictive models using six ML algorithms. The XGBoost VTE model might supply a complementary tool for clinical VTE prophylaxis decision-making and the proposed risk factors could shed some light on VTE risk stratification in CRC patients.
Read full abstract