Abstract

The identification of accident hot spots is a central task of road safety management. Bayesian count data models have emerged as the workhorse method for producing probabilistic rankings of hazardous sites in road networks. Typically, these methods assume simple linear link function specifications, which, however, limit the predictive power of a model. Furthermore, extensive specification searches are precluded by complex model structures arising from the need to account for unobserved heterogeneity and spatial correlations. Modern machine learning (ML) methods offer ways to automate the specification of the link function. However, these methods do not capture estimation uncertainty, and it is also difficult to incorporate spatial correlations. In light of these gaps in the literature, this paper proposes a new spatial negative binomial model which uses Bayesian additive regression trees to endogenously select the specification of the link function. Posterior inference in the proposed model is made feasible with the help of the Pólya-Gamma data augmentation technique. We test the performance of this new model on a crash count data set from a metropolitan highway network. The empirical results show that the proposed model performs at least as well as a baseline spatial count data model with random parameters in terms of goodness of fit and site ranking ability.

Highlights

  • The identification of accident-prone locations is a core task of road safety management (Cheng et al, 2020; Huang et al, 2009; Lee et al, 2020)

  • Assessment of model fit We evaluate the goodness of fit of the considered methods using the log pointwise predictive density (LPPD; Gelman et al, 2014) and the root mean square error (RMSE): 2 The Python code is publicly available at https://github.com/RicoKrueger/ nb_bart

  • We propose a spatial negative binomial Bayesian additive regression trees (NB-Bayesian Additive Regression Trees (BART)) model for the identification of accident hot spots in road networks

Read more

Summary

Introduction

The identification of accident-prone locations (so-called hot spots) is a core task of road safety management (Cheng et al, 2020; Huang et al, 2009; Lee et al, 2020). Accommodating flexible representations of unobserved heterogeneity in model parameters and accounting for correlations between spatial units are central themes of the recent crash count modelling literature (Cai et al, 2019a; Cheng et al, 2020; Dong et al, 2016; Heydari et al, 2016; Mannering et al, 2016; Ziakopoulos and Yannis, 2020). These flexible representations of unobserved heterogeneity are achieved at the cost of a restrictive linear specification of the link function. Whilst linear-in-parameters link functions are appealing from an interpretability perspective, an over-simplification of the relationship between predictors and the explained variable may negatively affect the predictive performance of a model (Li et al, 2008; Huang et al, 2016)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.