Abstract

Road crashes cause significant traffic delay, which can bring unnecessary financial losses. The objective of this study is to predict the level of delay caused by crashes (LDC) and discuss significant risk factors. To ensure the efficiency and accuracy of prediction, an improved stacking model was developed using Texas crash data of 2020. The first layer integrates seven base classifiers and the second layer tests three classifiers with different advantages. To improve and simplify the stacking model, three state-of-the-art methods—Bayesian hyperparameter optimization (BO), multiobjective feature selection (FS), and ensemble selection (ES)—were used. First, the hyperparameters and the least and most effective features were selected for each base classifier by BO and FS, respectively. Then ES, considering diversity and performance, selects the least base classifiers to reduce the input of the second layer. Finally, permutation feature importance was used to interpret the best stacking model. The results indicate that the stacking model achieves superior performance on four indicators: recall, G mean, F1 score, and area under the receiver operating characteristic (ROC) curve (AUC-ROC). FS significantly improves the efficiency of the stacking model and ES obtains a simplified stacking model without significantly reducing performance. In addition, the combination of the two methods (FS and ES) tends to achieve the best performance, and six risk factors have the greatest contributions in prediction using permutation feature importance. The prediction of LDC and the analysis of the main contributing factors help road managers respond to the rescue strategies to mitigate traffic congestion caused by crashes in a timely manner, thus minimizing economic losses.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call