Traffic crashes are a critical safety concern. Many studies have attempted to improve traffic safety by performing a wide range of studies on safety topics with the application of diverse statistical and machine learning models. The data elements contained in police-reported crash narrative information are not routinely analyzed with coded and structured crash data. In the recent years, unstructured textual contents in traffic crash narratives have been investigated by many researchers. However, most of these studies are basic text mining applications and often the dataset is limited in size. This study applied an advanced language model Bidirectional Encoder Representations from Transformers (BERT) to classify traffic injury types by using a dataset of over 750,000 unique crash narrative reports. The models have an 84.2% ±0.5 predictive accuracy and an Area Under the receiver operating Curve (AUC) of 0.93 ± 0.06 per class. Overall, the findings can assist safety engineers and analysts in determining the causes of a crash. The classification of crash injury types using a language model like BERT is a valuable tool for identifying additional factors that contribute to crashes, which can identify new areas for safety countermeasures and support the development of new safety strategies.
Read full abstract