Abstract

The analysis of highway accident data is largely dominated by traditional statistical methods (standard regression-based approaches), advanced statistical methods (such as models that account for unobserved heterogeneity), and data-driven methods (artificial intelligence, neural networks, machine learning, and so on). These methods have been applied mostly using data from observed crashes, but this can create a problem in uncovering causality since individuals that are inherently riskier than the population as a whole may be over-represented in the data. In addition, when and where individuals choose to drive could affect data analyses that use real-time data since the population of observed drivers could change over time. This issue, the nature of the data, and the implementation target of the analysis imply that analysts must often tradeoff the predictive capability of the resulting analysis and its ability to uncover the underlying causal nature of crash-contributing factors. The selection of the data-analysis method is often made without full consideration of this tradeoff, even though there are potentially important implications for the development of safety countermeasures and policies. This paper provides a discussion of the issues involved in this tradeoff with regard to specific methodological alternatives and presents researchers with a better understanding of the trade-offs often being inherently made in their analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call