Background/Objectives: Predicting patient readmission is an important task for healthcare risk management, as it can help prevent adverse events, reduce costs, and improve patient outcomes. In this paper, we compare various conventional machine learning models and deep learning models on a multimodal dataset of electronic discharge records from an Irish acute hospital. Methods: We evaluate the effectiveness of several widely used machine learning models that leverage patient demographics, historical hospitalization records, and clinical diagnosis codes to forecast future clinical risks. Our work focuses on addressing two key challenges in the medical fields, data imbalance and the variety of data types, in order to boost the performance of machine learning algorithms. Furthermore, we also employ SHapley Additive Explanations (SHAP) value visualization to interpret the model predictions and identify both the key data features and disease codes associated with readmission risks, identifying a specific set of diagnosis codes that are significant predictors of readmission within 30 days. Results: Through extensive benchmarking and the application of a variety of feature engineering techniques, we successfully improved the area under the curve (AUROC) score from 0.628 to 0.7 across our models on the test dataset. We also revealed that specific diagnoses, including cancer, COPD, and certain social factors, are significant predictors of 30-day readmission risk. Conversely, bacterial carrier status appeared to have minimal impact due to lower case frequencies. Conclusions: Our study demonstrates how we effectively utilize routinely collected hospital data to forecast patient readmission through the use of conventional machine learning while applying explainable AI techniques to explore the correlation between data features and patient readmission rate.
Read full abstract