Abstract

Crashes that involved large trucks often result in immense human, economic, and social losses. To prevent and mitigate severe large truck crashes, factors contributing to the severity of these crashes need to be identified before appropriate countermeasures can be explored. In this research, we applied three tree-based machine learning (ML) techniques, i.e., random forest (RF), gradient boost decision tree (GBDT), and adaptive boosting (AdaBoost), to analyze the factors contributing to the severity of large truck crashes. Besides, a mixed logit model was developed as a baseline model to compare with the factors identified by the ML models. The analysis was performed based on the crash data collected from the Texas Crash Records Information System (CRIS) from 2011 to 2015. The results of this research demonstrated that the GBDT model outperforms other ML methods in terms of its prediction accuracy and its capability in identifying more contributing factors that were also identified by the mixed logit model as significant factors. Besides, the GBDT method can effectively identify both categorical and numerical factors, and the directions and magnitudes of the impacts of the factors identified by the GBDT model are all reasonable and explainable. Among the identified factors, driving under the influence of drugs, alcohol, and fatigue are the most important factors contributing to the severity of large truck crashes. In addition, the exists of curbs and medians and lanes and shoulders with sufficient width can prevent severe large truck crashes.

Highlights

  • Since 1994, Texas has had the highest number of fatal crashes involving large trucks in theU.S [1].Among these large truck crashes, the AK level crashes (A is the incapacitating crash, and K is the fatal crash) often result in immense social and economic losses

  • According to the gradient boost decision tree (GBDT) model, “crash occurred on the shoulder” has positive impacts on the dependent variable, which means that a large truck crash occurred on the shoulder tend to be more severe than that occurred on the road

  • Three classification tree-based machine learning (ML) models, i.e., random forest (RF), AdaBoost, and GBDT models, and a mixed logit model were developed for analyzing the severity of the large truck crashes

Read more

Summary

Introduction

Since 1994, Texas has had the highest number of fatal crashes involving large trucks in the. The traditional regression models and ML-based methods have their own advantages and limitations, which are introduced in detail in the literature review section These models’ capabilities in analyzing the impacts of contributing factors to server truck crashes need to be investigated. This study was to investigate the contributing factors to the AK level large truck crashes by using both classification tree-based ML methods and regression models. For this purpose, three different types of classification tree-based ML methods, including random forest (RF), adaptive boosting (AdaBoost), gradient boost decision tree (GBDT), were used for identifying and analyzing the factors that have significant impacts on the severity of large truck crashes. After that, modeling results and their implications are discussed in detail, which leads to conclusions and recommendations

Data Description
Dependent and Independent Variables
Methodology
Mixed Logit Model
Identified AK Crash Contributing Factors
Analysis of the Impacts of the Identified Crash Contributing Factors
Partial Dependence Plots of GBDT Model
Comparison of the Mixed Logit Model and GBDT Model
The Impacts of the Identified Contributing Factors
Conclusions and Recommendations
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call