The classification of vehicular crashes based on their severity is crucial since not all of them have the same financial and injury values. In addition, avoiding crashes by identifying their influential factors is possible via accurate prediction modeling. In crash severity analysis, accurate and time-saving prediction models are necessary for classifying crashes based on their severity. Moreover, statistical models are incapable of identifying the potential severity of crashes regarding influencing factors incorporated in models. Unlike previous research efforts, which focused on the limited class of crash severity, including property damage only (PDO), fatality, and injury by applying data mining models, the present study sought to predict crash frequency according to five severity levels of PDO, fatality, severe injury, other visible injuries, and complaint of pain. The multinomial logistic regression (MLR) model and data mining approaches, including artificial neural network-multilayer perceptron (ANN-MLP) and two decision tree techniques, (i.e., Chi-square automatic interaction detector (CHAID) and C5.0) are utilized based on traffic crash records for State Highways in California, USA. The comparison of the findings of the relative importance of ten qualitative and ten quantitative independent variables incorporated in CHAID and C5.0 indicated that the cause of the crash (X1) and the number of vehicles (X5) were known as the most influential variables involved in the crash. However, the cause of the crash (X1) and weather (X2) were identified as the most contributing variables by the ANN-MLP model. In addition, the MLR model showed that the driver’s age (X11) accounts for a larger proportion of traffic crash severity. Therefore, the sensitivity analysis demonstrated that C5.0 had the best performance for predicting road crash severity. Not only did C5.0 take a shorter time (0.05 s) compared to CHAID, MLP, and MLR, it also represented the highest accuracy rate for the training set. The overall prediction accuracy based on the training data was approximately 88.09% compared to 77.21% and 70.21% for CHAID and MLP models. In general, the findings of this study revealed that C5.0 can be a promising tool for predicting road crash severity.
Read full abstract