A large language model framework to uncover underreporting in traffic crashes

Cristian Arteaga,Jeewoong Park

doi:10.1016/j.jsr.2024.11.009

Abstract

Introduction: Crash reports support the development of traffic safety countermeasures, but these reports often suffer from underreporting of crucial crash factors due to miscoded entries during data collection. To rectify these issues, the current practice relies on manual information rectification, which is time consuming and error prone, especially with large data volumes. To address these hurdles, we develop a framework to analyze traffic crash narratives and uncover underreported crash factors by capitalizing on the capabilities of Large Language Models (LLM). Method: The framework integrates procedures for prompt definition, selection of LLM generation parameters, output parsing, and underreporting determination. For evaluation, we present a case study on identification of underreported alcohol involvement in traffic crashes. We investigate the framework’s identification accuracy in relation to different underlying LLMs (i.e., ChatGPT, Flan-UL2, and Llama-2), prompt framings (i.e., explicit vs. implicit matching), and generation parameters (i.e., sampling temperature and nucleus probability). Our validation dataset consists of 500 crash reports from the State of Massachusetts. Results: Analysis results demonstrate that the developed framework achieves a recall and precision of up to 1.0 and 0.93, respectively, indicating a successful retrieval of underreported instances. These findings indicate that the developed framework addresses a critical gap in the existing traffic safety analysis workflow by enabling safety analysts to uncover underreporting in crash data efficiently and accurately, without the need for extensive expertise in natural language processing. Practical Applications: Thus, the developed approach offers unprecedented opportunities to maximize the quality and comprehensiveness of traffic crash records, paving the way for more effective countermeasure development.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A large language model framework to uncover underreporting in traffic crashes

Abstract

Talk to us

Similar Papers

More From: Journal of Safety Research

Lead the way for us

Similar Papers

Data linkage: An untapped resource for reducing serious traffic injuries in fast developing countries.
Gordon S Smith
Journal of Local and Global Health Science | VOL. 2015
Gordon S SmithGordon S Smith
12 Nov 2015
Journal of Local and Global Health Science | VOL. 2015

#2924 Comparison of large language models and traditional natural language processing techniques in predicting arteriovenous fistula failure
Suman Lama ... Luca Neri
Nephrology Dialysis Transplantation | VOL. 39
Suman Lama, et. al.Suman Lama ... Luca Neri
23 May 2024
Nephrology Dialysis Transplantation | VOL. 39

Relationship between mobility and road traffic injuries during COVID-19 pandemic—The role of attendant factors
Kandaswamy Paramasivan ... Venkatesh Mohan Sharma
-
Kandaswamy Paramasivan, et. al.Kandaswamy Paramasivan ... Venkatesh Mohan Sharma
20 May 2022
20 May 2022

Factors associated with risky driving behaviors for road traffic crashes among professional car drivers in Bahirdar city, northwest Ethiopia, 2016: a cross-sectional study
Tesfaye Hambisa Mekonnen ... Yitayew Ashagrie Tesfaye
Environmental Health and Preventive Medicine | VOL. 24
Tesfaye Hambisa Mekonnen, et. al.Tesfaye Hambisa Mekonnen ... Yitayew Ashagrie Tesfaye
09 Mar 2019
Environmental Health and Preventive Medicine | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A large language model framework to uncover underreporting in traffic crashes

Abstract

Talk to us

Similar Papers

More From: Journal of Safety Research