Abstract

Machine learning (ML) model interpretability has attracted much attention recently given the promising performance of ML methods in crash frequency studies. Extracting accurate relationship between risk factors and crash frequency is important for understanding the causal effects of risk factors and developing safety countermeasures. However, there is no study that comprehensively summarizes ML model interpretation methods and provides guidance for safety researchers and practitioners. This research aims to fill this gap. Model-based and post-hoc ML interpretation methods are critically evaluated and compared to study their suitability in crash frequency modeling. These methods include classification and regression tree (CART), multivariate adaptive regression splines (MARS), Local Interpretable Model-agnostic Explanations (LIME), Local Sensitivity Analysis (LSA), Partial Dependence Plots (PDP), Global Sensitivity Analysis (GSA), and SHapley Additive exPlanations (SHAP). Model-based interpretation methods cannot reveal the detailed interaction relationships among risk factors. LIME can only be used to analyze the effects of a risk factor at the prediction level. LSA and PDP assume that different risk factors are independently distributed. Both GSA and SHAP can account for the potential correlation among risk factors. However, only SHAP can visualize the detailed relationships between crash outcomes and risk factors. This study also demonstrates the potential and benefits of using ML and SHAP to derive Crash Modification Factors (CMF). Finally, it is emphasized that statistical and ML models may not directly differentiate causation from correlation. Understanding the differences between them is critical for developing reliable safety countermeasures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call