The Sydney Triage to Admission Risk Tool (START2) using machine learning techniques to support disposition decision-making.

Kathryn Rendell,Anja A Ebker‐White,Michael M Dinh,Andre Kyme,Irena Koprinska

doi:10.1111/1742-6723.13199

Kathryn Rendell, Anja A Ebker‐White + Show 3 more

https://doi.org/10.1111/1742-6723.13199

Copy DOI

Abstract

To further develop and refine an Emergency Department (ED) in-patient admission prediction model using machine learning techniques. This was a retrospective analysis of state-wide ED data from New South Wales, Australia. Six classification algorithms (Bayesian networks, decision trees, logistic regression, naïve Bayes, neural networks and nearest neighbour) and five feature selection techniques (none, manual, correlation-based, information gain and wrapper) were examined. Presenting problem was categorised using broad (n = 20) and specific (n = 100) representations. Models were evaluated based on Area Under the Curve (AUC) and accuracy. The results were compared with the Sydney Triage to Admission Risk Tool (START), which uses logistic regression and six manually selected features. Sixty admission prediction models were trained and validated using data from 1 721 294 patients. Under the broad representation of presenting problem, the nearest neighbour algorithm with manual feature selection had the best AUC of 0.8206 (95% CI ±0.0006), while the decision tree with no feature selection had the best accuracy of 74.83% (95% CI ±0.065). Under the specific representation, almost all models improved; the nearest neighbour with information gain feature selection had the best AUC of 0.8267 (95% CI ±0.0006), while the decision tree with wrapper or no feature selection had the best accuracy of 75.24% (95% CI ±0.064). Eleven of the machine learning models had slightly better AUC than the START model. Machine learning methods demonstrate similar performance to logistic regression for ED disposition prediction models using basic triage information. This should be investigated further, especially for larger data sets with more complex clinical information.

Full Text