The triage process in emergency departments (EDs) relies on the subjective assessment of medical practitioners, making it unreliable in certain aspects. There is a need for a more accurate and objective algorithm to determine the urgency of patients. This paper explores the application of advanced data-synthesis algorithms, machine learning (ML) algorithms, and ensemble models to predict patient mortality. Patients predicted to be at risk of mortality are in a highly critical condition, signifying an urgent need for immediate medical intervention. This paper aims to determine the most effective method for predicting mortality by enhancing the F1 score while maintaining high area under the receiver operating characteristic curve (AUC) score. This study used a dataset of 7325 patients who visited the Yonsei Severance Hospital’s ED, located in Seoul, South Korea. The patients were divided into two groups: patients who deceased in the ED and patients who didn’t. Various data-synthesis techniques, such as SMOTE, ADASYN, CTGAN, TVAE, CopulaGAN, and Gaussian Copula, were deployed to generate synthetic patient data. Twenty two ML models were then utilized, including tree-based algorithms like Decision tree, AdaBoost, LightGBM, CatBoost, XGBoost, NGBoost, TabNet, which are deep neural network algorithms, and statistical algorithms such as Support Vector Machine, Logistic Regression, Random Forest, k-nearest neighbors, and Gaussian Naive Bayes, as well as Ensemble Models which use the results from the ML models. Based on 21 patient information features used in the pandemic influenza triage algorithm (PITA), the models explained previously were applied to aim for the prediction of patient mortality. In evaluating ML algorithms using an imbalanced medical dataset, conventional metrics like accuracy scores or AUC can be misleading. This paper emphasizes the importance of using the F1 score as the primary performance measure, focusing on recall and specificity in detecting patient mortality. The highest-ranked model for predicting mortality utilized the Gaussian Copula data-synthesis technique and the CatBoost classifier, achieving an AUC of 0.9731 and an F1 score of 0.7059. These findings highlight the effectiveness of machine learning algorithms and data-synthesis techniques in improving the prediction performance of mortality in EDs.
Read full abstract