Abstract

The proliferation of digital textual archives in the transportation safety domain makes it imperative for the inventions of efficient ways of extracting information from the textual data sources. The present study aims at utilizing crash narratives complemented by crash metadata to discern the prevalence and co-occurrence of themes that contribute to crash incidents. Ten years (2009–2018) of Michigan traffic fatal crash narratives were used as a case study. The structural topic modeling (STM) and network topology analysis were used to generate and examine the prevalence and interaction of themes from the crash narratives that were mainly categorized into pre-crash events, crash locations and involved parties in the traffic crashes. The main advantage of the STM over the other topic modeling approaches is that it allows the researchers to discover themes from documents and estimate how the topic relates to the document metadata. Topics with the highest prevalence for the angle, head-on, rear-end, sideswipe and single motor vehicle crashes were crash at stop-sign, crossing the centerline, unable to stop, lane change maneuver and run-off-road crash, respectively. Eigenvector centrality measure in network topology showed that event-related topics were consistently central in articulating the crash occurrence. The centrality and association between topics varied across crash types. The efficacy of generated topics in classifying crashes by type was tested using a machine learning algorithm, Random Forest. The classification accuracy in the held-out sample ranged between 89.3 % for sideswipe crashes to 99.2 % for single motor vehicle crashes. High classification accuracy suggests that automation of crash typing and consistency checks can be accomplished effectively by using extracted latent themes from the crash narratives.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call