1. Creating a model to predict LOS at presentation, and accessing its performance. 2. Identifying the powerful predictors to inform future models. We collected consecutive medical patients’ data at presentation in the ED of Far Eastern Memorial Hospital, Taiwan from Feb 1 to Sep 30, 2016. Other variables like ambulance diversion duration and ED census were incorporated. In the 8-month period, 47,807 medical visits were observed. After excluding outliers (eg, high utilizers) and those with missing values, a final dataset of 44,148 patient entries and 109 features was created, and then randomly split into a training set and a test set in a ratio of 7:3. The features comprised 4 categories, namely demographics, chief complaints, vital signs, and registration status. Random Forest, the machine-learning (ML) algorithm, was chosen in terms of its interpretability and non-linearity capturability. The model was trained and tuned on the training set, and then evaluated on the test set in terms of mean squared error (MSE). Among the 44,148 observations, the median and the mean of ED LOS measured in hours are 2.30 and 7.73. Logarithmic transformation (2 based) was applied to this outcome variable so that the distribution of the newly derived variable (denoted as ‘EDLOS_hr_log’) was closer to a normal distribution. Table 1 shows the descriptive statistics regarding demographic data in the training set and the test set. Our model scored 2.22 and 2.26 of MSE on the training set and the test set respectively, which signaled the model associated with low variance but high bias. Since there was no benchmark model available for comparison, the concept of a dummy regressor, using the mean of EDLOS_hr_log in the training set as the predicted value for any future cases, was introduced to inform the model’s performance. This dummy regressor scored 3.19 of MSE on the test set; therefore, our model improved the predictions by 29.2%. Moreover, our model could capture the pattern of ED load, described as the moving average of EDLOS_hr_log against date (Figure 1). Figure 2 summarized the top 15 most influential predictors and their categories, which shed a light on the future modeling. The order in this list represents their rankings. The higher it is, the more powerful it is. The intersection of ML and medicine is emerging and evolving. Our model serves as an excellent example how machines can support medical/administrative decisionmaking. We find evidence that ED LOS was predictive at presentation. More importantly, it implies that strategies to tackle ED crowding and reduce ED LOS should incorporate hospital systemic constraints rather than focus only on an ED itself.Table 1Descriptive demographic statistics on the 2 subsets.The training setThe test setPatient Counts30,90313,245Age†51.0 [19.9]51.2 [19.9]Triage 11,120 (3.62%)466 (3.52%)Triage 25,444 (17.6%)2,401 (18.1%)Triage 320,462 (66.2%)8,712 (65.8%)Triage 43,842 (12.4%)1,643 (12.4%)Triage 535 (0.11%)23 (0.17%)EDLOS_hr_log†1.62 [1.77]1.63 [1.79] Open table in a new tab View Large Image Figure ViewerDownload Hi-res image Download (PPT)
Read full abstract