Abstract

Various demographic and medical factors can be linked to severe deterioration of patients suffering from traumatic injuries. Accurate identification of the most relevant variables is essential for building more accurate prediction models and making more rapid life-saving medical decision. The intention of this paper is to select a number of features that can be used to accurately predict patients’ outcomes through three feature selection methods: random forest, ReliefF and the evidential reasoning (ER) rule. The impact of an outcome’s class imbalance on feature selection is discussed, and synthetic minority over-sampling technique (SMOTE) is performed to show the differences in the selected features. The results show that length of stay in hospital, length of stay in intensive care unit, age, and Glasgow Coma Scale (GCS) are the most selected features across different techniques. The prediction models based on the features selected by the ER rule show the highest prediction performance represented by the area under the receiver operating characteristic curve (AUC) values, which has a median of 0.895 for the model employed by the ten highest-weighted variables, while the median AUC values are 0.827 and 0.885 if the ten highest-weighted variables are selected by ReliefF and random forest respectively. The results also show that after the ten most important features, increasing the number of the less important features has only a slight increase in prediction accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call