Fatigue is the most prevalent symptom across cancer types. To support clinicians in providing fatigue-related supportive care, this study aims to develop and compare models predicting clinically relevant fatigue (CRF) occurring between two and three years after diagnosis, and to assess the validity of the best-performing model across diverse cancer populations. Patients with non-metastatic bladder, colorectal, endometrial, ovarian, or prostate cancer who completed a questionnaire within three months after diagnosis and a subsequent questionnaire between two and three years thereafter, were included. Predictor variables included clinical, socio-demographic, and patient-reported variables. The outcome was CRF (EORTC QLQC30 fatigue ≥ 39). Logistic regression using LASSO selection was compared to more advanced Machine Learning (ML) based models, including Extreme gradient boosting (XGBoost), support vector machines (SVM), and artificial neural networks (ANN). Internal-external cross-validation was conducted on the best-performing model. 3160 patients were included. The logistic regression model had the highest C-statistic (0.77) and balanced accuracy (0.65), both indicating good discrimination between patients with and without CRF. However, sensitivity was low across all models (0.22-0.37). Following internal-external validation, performance across cancer types was consistent (C-statistics 0.73-0.82). Although the models' discrimination was good, the low balanced accuracy and poor calibration in the presence of CRF indicates a relatively high likelihood of underdiagnosis of future CRF. Yet, the clinical applicability of the model remains uncertain. The logistic regression performed better than the ML-based models and was robust across cohorts, suggesting an advantage of simpler models to predict CRF.
Read full abstract