Various techniques have been proposed in the literature to account for the observed and unobserved heterogeneity in the crash dataset. Those include techniques such as the finite mixture model (FMM), or hierarchical techniques. The FMM could provide a flexible framework by providing various distributions for various individual observations. However, the shortcoming of the standard FMM is that it cannot account for the heterogeneity in a single model’s structure, and the data needs to be disaggregated to its resultant subsamples. That would result in a loss of information. On the other hand, a second plausible approach is to use a hierarchical technique to account for the data heterogeneities, being based on various explanatory variables, and based on engineering intuition. In the context of traffic safety, while some researchers, for instance, considered the seasonality, some others considered highway systems or even genders. However, a question might arise: are the same observations within a same hierarchy homogenous? Are all the observations within different clusters heterogeneous? Additionally, how about other variables? Although the results in the literature highlighted accounting for the structure of the dataset would result in an acceptable interclass correlation (ICC), and also result in a significant improvement in terms of reduction in the deviance information criteria (DIC), there is no justification why to use those specific hierarchies and reject others. A more reasonable approach is to let the algorithm come up with the best distributions based on the provided parameters and accommodate observations to the related mixtures. In that approach those observations that belong to various subjective hierarchies, e.g., winter versus summer, but found to be similar would be set in a similar cluster. That is why we proposed this methodology to implement an objective hierarchy of the FMM to be used for the hierarchical technique. Here, due to the label switching problem of the FMM in the context of Bayesian, the FMM first conducted in the context of maximum likelihood estimates, and then assigned observations were used for the final analysis. The results of the DIC highlighted a significant improvement in the model fit compared with a subjective assigned hierarchy based on highway system. Additionally, although the subjective model resulted in a very low ICC due to so much heterogeneity in the dataset, the implemented methodology resulted in an acceptable ICC (0.3), justifying the use of hierarchy. The Bayesian hierarchical finite mixture model (BHFMM) is one of earliest application in traffic safety studies. The findings of this study have important implications for the future studies to account for a higher heterogeneity of the crash dataset based on the distance of observations to each cluster.
Read full abstract