Abstract

Crash injury severity prediction is a promising research target in traffic safety. Traditionally, various statistical methods were used for modeling crash injury severities. In recent years, machine learning-based methods are becoming popular due to their good predictive performance. However, the machine learning-based models are usually criticized as they perform like a black-box. In this paper, we aim at comparing the predictive performance, including prediction accuracy and estimation of variable importance, among various machine learning and statistical methods with distinct modeling logic for crash severity analysis. The crash severity, road geometry, and traffic flow data were collected at freeway diverge areas in Florida. We estimated two most commonly used statistical methods which were ordered probit (OP) model and multinomial logit model, and four popular machine learning methods, including K-Nearest Neighbor, Decision Tree, Random Forest (RF), and Support Vector Machine. The correct prediction rate for each crash severity level and the overall correct prediction rate were calculated. The results showed that the machine learning methods had higher predicting accuracy than the statistical methods, though they suffered from over-fitting issue. The RF method had the best prediction in overall and severe crashes while OP was the weakest one. We compared variable importance on crash severity via perturbation-based sensitivity analyses. The results showed that the inferences of variable importance from different methods were not always consistent and should be paid careful attention.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call