Abstract
There is a growing interest in the application of the machine learning techniques in predicting the motorcycle crash severity. This is partly due to a progress in autonomous vehicles technology, and machine learning technique, which as a main component of autonomous vehicle could be implemented for traffic safety enhancement. Wyoming's motorcycle crash fatalities constitute a concern since the count of riders being killed in motorcycle crashes in 2014 was 11% of the total road fatalities in the state. The first step of crash reduction could be achieved through identification of contributory factors to crashes. This could be accomplished by using a right model with high accuracy in predicting crashes. Thus, this study adopted random forest, support vector machine, multivariate adaptive regression splines and binary logistic regression techniques to predict the injury severity outcomes of motorcycle crashes. Even though researchers applied all the aforementioned techniques to model motorcycle injury severities, a comparative analysis to assess the predictive power of such modeling frameworks is limited. Hence, this study contributes to the road safety literature by comparing the performance of the discussed techniques. In this study, Wyoming's motorcycle crash injury severities are modeled as functions of the characteristics that give rise to crashes. Before conducting any analyses, feature reduction was used to identify a best number of predictors to be included in the model. Also to have an unbiased estimation of the performance of different machine learning techniques, 5-fold cross-validation was used for model performance evaluation. Two measure, Area under the curve (AUC), and confusion matrix were used to compare different models' performance. The machine learning results indicate that random forest model outperformed the other models with the least misclassification and higher AUC. It was also revealed that a dichotomous response variable, with fatality and incapacitation injury in one category, along with all other categories in another group would result in a lower misclassification rate than a polychotomous response variable. This might result from the nature of motorcycle crashes, lacking a protection compared with passenger cars, preventing machine learning technique to get trained properly. Moreover, the most important variables identified by the random forest model are those related to the operating speed, resentful other party, traffic volume, truck traffic volume, riding under the influence, horizontal curvature, wide roadway with more than two lanes and rider's age.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.