BackgroundMost end-stage renal disease patients rely on hemodialysis (HD) to maintain their life, and they face a serious financial burden and high risk of mortality. Due to the current situation of the health care system in China, a large number of patients on HD are lost to follow-up, making the identification of patients with high mortality risks an intractable problem. ObjectiveThis paper aims to propose a maintenance HD mortality prediction approach using longitudinal HD data under the situation of data imbalance caused by follow-up losses. MethodsA long short-term memory autoencoder (LSTM AE) based model is proposed to capture the physical condition changes of HD patients and distinguish between surviving and nonsurviving patients. The approach adopts anomaly detection theory, using only the surviving samples in the model training and identifying dead samples based on autoencoder reconstruction errors. The data are from a Chinese hospital electronic health record system between July 30, 2007, and August 25, 2016, and 36/72/108 continuous HD sessions were used to predict mortality within prediction windows of 90/180/365 days. Furthermore, the model performance is compared to that of logistic regression, support vector machine, random forest, LSTM classifier, isolation forest, and stacked autoencoder models. ResultsData for 1200 patients (survival: 1055, death: 145) were used to predict mortality during the next 90 days using 36 continuous HD sessions. The area under the PR curve for the LSTM AE was 0.57, the Recallmacro was 0.86, and the F1-scoremacro was 0.87, outperforming the other models. Upon varying the observation window or prediction window length, LSTM AE continued to outperform the other models. According to the variable importance analysis, the dialysis session length was the feature that contributed the most to the prediction model. ConclusionsThe proposed approach was able to detect patients on maintenance HD with high mortality risk from an imbalanced dataset using anomaly detection theory and leveraging longitudinal HD data.
Read full abstract