With the growing development and deployment of automated vehicles (AVs), it is crucial to understand the associated risks and factors contributing to collisions involving AVs. California possesses an immense amount of publicly available data from AV testing due to the requirement laid out by the California Department of Motor Vehicles (DMV), which requires all automated vehicle operators to report collisions for any level of severity. However, this information is reported in specific forms and requires a laborious task to aggregate data from these reports. This study creates an automated data extraction system for these reports and analyses collision characteristics using logistic regression models as well as XGBoost models with SHapley Additive exPlanation (SHAP) interpretation. Additionally, these characteristics are matched with the characteristics of non-automated vehicles (non-AVs) for the same region. The results of the study indicate that rear-end collisions are the most common type of collision observed in currently deployed AVs. The analysis further revealed an increased likelihood of injury-prone rear-end collisions in AVs at intersections compared to non-AVs. Transportation policymakers and researchers should take these safety concerns into account when addressing AV deployment and developing appropriate measures to mitigate collision risks in mixed fleet conditions.
Read full abstract