Abstract Background and Aims INitiativeS on advancing Patients’ outcomes In REnal disease (INSPIRE) is an academia and industry collaboration set forth to identify critical investigations needed to advance the practice of medicine in nephrology. Gastrointestinal bleeding (GIB) is one of the most common types of bleeding events in the kidney dialysis population [1]. We aimed to develop a model to predict GIB hospitalization risk in a hemodialysis (HD) patient within 180 days. We evaluated advanced machine learning algorithm (XGBoost) and compared it to traditional machine learning algorithm (logistic regression) modeling techniques. Method We used data from a kidney care network from 2017 through 2020. We included data from adult dialysis patients (age ≥18 years). GIB related hospitalization was defined based on international classification of diseases (ICD) diagnosis codes recorded as the primary, secondary, or tertiary discharge reason for hospitalization. Two distinct models were created using XGBoost and logistic regression algorithms and using the same dataset. Both models were evaluated using metrics such as area under the receiver operating curve (AUROC), accuracy, sensitivity, and specificity. Missing data was imputed using mean for quantitative data and mode for qualitative data. The dataset then was randomly divided into 60% training, 20% validation and 20% test dataset. The test data, comprising unseen patients and data was used to evaluate the performance of the model. This means that the model had never encountered the test data during its learning phase, which included training the model and then validating the model. Results The incidence of 180-day GIB hospitalization was 1.12% in the HD population (n = 5 116 with GIB hospitalization/ n = 451 653 without GIB hospitalization), and consistent in the unseen test dataset (n = 465 with GIB hospitalization/ n = 38586 without GIB hospitalization). The XGBoost model showed higher predictive performance compared to logistic regression (Table). The AUROC was 0.72 (95% confidence interval (CI) 0.69, 0.74) for the XGBoost model. In comparison, the AUROC was 0.615 (95% CI 0.60, 0.64) for the logistic regression model. Specificity was 67% versus 58% for the XGBoost model versus logistic regression model respectively. However, sensitivity for both the models was 65%. The top predictors for major GIB were consistent in both models for many factors, yet some factors showed different importance on outcome prediction (Fig. 1). Higher risk of GIB hospitalization was associated with older age, lower ferritin levels, and recent all-cause hospitalizations in both models. The XGBoost showed high importance on outcome prediction for lower hemoglobin and higher serum 25 hydroxy (25OH) vitamin D values, and the logistic regression model showed high importance for higher amounts of saline delivered during an HD treatment and lower intact parathyroid hormone (iPTH) levels. Conclusion We found advanced machine learning prediction modeling (XGBoost) appears suitable for identifying a HD patient at risk for a GIB hospitalization in the next 180 days, and outperforms traditional machine learning (logistic regression) modeling techniques. Although both models showed identical sensitivity, the XGBoost model had higher specificity. Prospective testing is needed to confirm the model's performance. The association between bone mineral metabolism markers and GIB hospitalization risk is unexpected and warrants further investigation.
Read full abstract