Abstract

Online media reports provide valuable information for road traffic injury prevention, but technical challenges concerning data acquisition and processing limit analysis and interpretation of such data. Integrating injury epidemiology theory and big data technology, we developed a data platform consisting of four layers (data acquisition, data processing, application and data storage) to automatically collect reports from online Chinese media concerning road traffic crashes every 24 h. We built a text classification model using 20,000 manually annotated news stories based on the Bidirectional Encoder Representations from Transformers (BERT) and then used natural language processing algorithms to extract data concerning 27 structured variables from the news sources. The accuracy of the BERT-based text classification model was 0.9271, with information extraction accuracy exceeding 80% for 22 variables. As of November 30, 2021, the data platform collected 244,650 eligible media reports covering all 333 prefecture-level divisions in China. These reports were from 37,073 websites or social media accounts, which were geographically located in all 31 provinces and over 98% of prefecture-level divisions. Data availability varied greatly from 0.9% to 100% across the 27 structured variables. Additionally, the platform identified 645,787 potentially relevant keywords when applying natural language processing techniques to the textual media reports. Platform data were highly correlated with road police data in province-based road traffic crash statistics (crashes, rs = 0.799; non-fatal injuries, rs = 0.802; deaths, rs = 0.775). In particular, the platform offers valuable data (like crashes involving electric vehicles) that are not included in official road traffic crash statistics. The new automated data platform shows great potential for timely detection of emerging characteristics of road traffic crashes. Further research is needed to improve the platform and apply it to real-time monitoring and analysis of road traffic injuries.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.