Abstract
Conventional traffic crash analysis methods often use highly aggregated data, making it difficult to understand the effects of time-varying factors on crash occurrence. Although studies have used data with small aggregation intervals, they typically analyze the effect of a single factor on crash occurrence. In this study, we investigate the collaborative effect of roadway geometry, speed distribution, and weather conditions on crash occurrence and severity using explainable machine learning methods on daily level crash data. The data were collected on rural Interstate highways in Texas. Four machine learning methods: random forest, AdaBoost, XGBoost, and deep neural network, were tested on the dataset. The results showed that XGBoost performs the best on the imbalanced dataset. The study used the synthetic minority oversampling technique (SMOTE) method to mitigate the data imbalance issue. The XGBoost model was trained separately on all crash occurrences and severe crash occurrences. Finally, the SHAP (SHapley Additive exPlanation) method was applied to investigate the contribution of all variables to the model’s output. The results showed that weather condition factors have a significant contribution to all crash occurrences. Speed distribution factors have a stronger impact on severe crash occurrences.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have