Abstract Background Fatigue is commonly identified by IBD patients as one of the most important issues that affect their wellbeing1. Despite advanced immune therapies, 50% of patients experienced significant fatigue. The central mechanisms beyond gut inflammation that drives fatigue are not known. Methods We utilized 5 different machine-learning (ML) approaches in a combined prospective and cross-sectional multi-center cohort2 in UK (2020-24) to develop models and identify factors that can predict IBD-associated fatigue. Our ML-modelling involved 531 IBD (260 UC, 246 CD, 25 IBDU) patients, ~100 clinical variables (including CUCQ32-patient reported outcomes, seasonality, BMI, drug treatments, disease phenotype and all hospital-based blood/stool tests) and 1200 data points over time. ML-approaches were gradient boosting, random forests, logistic regression, support vector machines and deep neural networks. We incorporated various methods to identify the relative importance of clinical factors and our ML-performance. We trained our ML-fatigue on overall IBD and on patients in remission. Results Using an IBD CUCQ32 dataset involving 2500 patient responses (Figure 1), we used patient-reported data of significant fatigue experienced in 10 out of 14 days as IBD-fatiguehigh (Figure 1). Through train-validate-test steps, we found that traditional statistics and machine learning performed similarly (Random Forest AUC of 0.75; logistic regression AUC of 0.74 using R). AdaBoost, XGBoost, random forest and logistic regression models performed similarly while support vector classifiers and neural networks performed poorly (Table 1). SHapley Additive exPlanations (SHAP) analysis of the models3, based on game theory, revealed that each model prioritises different variables - the top 3 factors for XGBoost were has_active_symptoms, weight and platelets; whereas logistic regression ranked seasonality variables (autumn and winter) higher. The analysis indicates the potential of ML to integrate non-linear factors in a way that traditional statistics do not capture well. ML performance to predict IBD-fatiguehigh in patients in biochemical remission (CRP<5 and calprotectin <250) was poor, with AUCs of 0.61 for logistic regression and 0.66 for random forest. Conclusion We provide a comprehensive ML-pathway to predict IBD-associated fatigue. Despite integration of multiple clinical factors and use of interpretable ML-approaches, prediction of fatigue in IBD patients, remains suboptimal. Our data suggests a large ‘hidden’ pathobiological component and current work is in progress to integrate deep molecular data and build a clinical-scientific AI model to improve understanding of IBD-associated fatigue.
Read full abstract