Human factors contribute in some way to almost 93% of road crashes. Because the unavailability of good pre-crash data, the contribution of human factors to safety-critical events (SCEs) and the prediction of crashes using real-world data is lightly researched. This study provides predictive accuracy by harnessing unique real-world naturalistic driving study (NDS) data, which includes dynamic pre-crash information about driving behavior and performance. After cleaning and preprocessing, a final subsample (N = 9,237) was used and split into training and test samples. For consistent comparison of variables’ importance in statistical and machine learning (ML) models, the dominance analysis uncovered the most important predictors used by the ordered Probit model. Next, three non-parametric supervised ML methods, because promising prediction performance and cost-effectiveness, including Naïve Bayes, K-Nearest Neighbors, and Gradient Boosting Tree (GBT) were used. The overall out-of-sample prediction accuracy for the ordered Probit model was 85.75% which was lower than all three ML methods. The GBT showed the highest (91.23%) out-of-sample prediction accuracy. The availability of pre-crash naturalistic data helps significantly improve the prediction accuracy of SCEs as cumulative importance for all available human factors in the GBT classifier was 94%. For practical applications, refer to the article.
Read full abstract