Abstract

Walking and bicycling are lauded for their negative net carbon impact and for their health benefits. However, national crash statistics suggest that pedestrians are disproportionately harmed in any vehicle–pedestrian conflict situation. Although automated transportation in the future is anticipated to increase overall safety, multiple incidents involving automated vehicles have been reported recently, indicating that the technology needs more training on real-world scenarios and conflicts. This research is motivated by the need for contextual data and related levels of harm in potential conflict scenarios in mixed traffic and we use a national police reported crash dataset, CRSS, to address this need. Our study uses a new gradient boosting algorithm, XGBoost, to identify important features among a host of seemingly significant variables. We compare the performance of XGBoost with the more frequently used random forest method and find that XGBoost is more reliable and robust for handling an unbalanced and sparse dataset like crash data, and the features extracted are more aligned to findings from previous research on the topic. We also compare feature importance between NASS-GES and CRSS—two national crash databases with different sampling strategies but the same objective—and find that sampling strategy influences selection of feature importance. We further use the features extracted using XGBoost in a multiclass logistic regression to quantify the effect of these features on different levels of pedestrian injury. Our findings indicate that speed limit, light conditions, pre-crash movements, and location of pedestrian are important contributors to crash severity, along with driver distraction and impairment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call