ObjectivePresentation delay of cancer patients prevents the patient from timely diagnosis and treatment leading to poor prognosis. Predicting the risk of presentation delay is crucial to improve the treatment outcomes. This study aimed to develop and validate prediction models of presentation delay risk in gastric cancer patients by using various machine learning models.Methods875 cases of gastric cancer patients admitted to a tertiary oncology hospital from July 2023 to June 2024 were used as derivation cohort, 200 cases of gastric cancer patients admitted to other 4 tertiary hospital were used as external validation cohort. After collecting the data, statistical analysis was performed to identify discriminative variables for the prediction of presentation delay and 13 statistically significant variables are selected to develop machine learning models. The derivation cohort was randomly assigned to the training and internal validation set by the ratio of 7:3. Prediction models were developed based on six machine learning algorithms, which are logistic regression (LR), support vector machine (SVM), random forest (RF), gradient boosted trees (GBDT), extremely gradient boosting (XGBoost) and muti-layer perceptron (MLP). The discrimination and calibration of each model were assessed based on various metrics including accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F1-Score and area under curve (AUC), calibration curves and Brier scores. The best model was selected based on comparing of various metrics. Based on the selected best model, the impact of features to the prediction result was analyzed with the permutation feature importance method.ResultsThe incidence of presentation delay for gastric cancer patients was 39.3%. The developed models achieved performance metrics as AUC (0.893-0.925), accuracy (0.817-0.847), sensitivity (0.857-0.905), specificity (0.783-0.854), PPV (0.728-0.798), NPV (0.897-0.927), F1 score (0.791-0.826) and Brier score (0.107-0.138) in internal validation set, which indicated good discrimination and calibration for the prediction of presentation delay in gastric cancer patients. Among all models, RF based model was selected as the best one as it achieved good discrimination and calibration performance on both of internal and external validation set. Feature ranking results indicated that both of subjective and objective factors have significant impact on the occurrence of presentation delay in gastric cancer patients.ConclusionThis study demonstrated that the RF based model has favorable performance for the prediction of presentation delay in gastric cancer patients. It can help medical staffs to screen out high-risk gastric cancer patients for presentation delay, and to take appropriate and specific interventions to reduce the risk of presentation delay.
Read full abstract