Abstract
Abstract Crash data is the foundation of traffic safety analysis, which can help experts find the cause of crashes and propose corresponding countermeasures. In China, the accident reporting form only allows reporting of one crash cause for each crash based on the prespecified crash cause code. This designation may lead to inaccuracy in recording crash data, especially for state-related crashes. The crash narratives, which are the responding officer's written account of what occurred before, during and after a crash, contain considerable free-form information associated with the crash occurrence. This study investigated the directly contributory factors behind state-related crashes through the development of natural language processing and deep-learning models based on 1625 state-related crash narratives. According to the directly causative factors described in the crash narratives, state-related crashes were labelled speed related, turning related and other causes. Then the crash narratives were vectorized for model training and frequency analysis. The text-CNN, LSTM, GRU and SVM models were applied to reclass the vectorized crash. The results showed that the text-CNN model showed the best model performance in text classification, and the AUC value of this model reached 0.90 for micro-average curves. The results from this study can engage the usage of crash narratives and help identify the actual causative reasons hidden behind some inaccurate crash value designations.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have