Abstract

With the rapid advancement of Internet technology and the increasing volume of police reports, relying solely on extensive human labor and traditional natural language processing methods for key element extraction has become impractical. Applying advanced technologies such as large language models to improve the effectiveness of police report extraction has become an inevitable trend in the field of police data analysis. This study addresses the characteristics of Chinese police reports and the need to extract key elements by employing large language models specific to the public security domain for entity extraction. Several lightweight (6/7b) open-source large language models were tested as base models. To enhance model performance, LoRA fine-tuning was employed, combined with data engineering approaches. A zero-shot data augmentation method based on ChatGPT and prompt engineering techniques tailored for police reports were proposed to further improve model performance. The key police report data from a certain city in 2019 were used as a sample for testing. Compared to the base models, prompt engineering improved the F1 score by approximately 3%, while fine-tuning led to an increase of 10–50% in the F1 score. After fine-tuning and comparing different base models, the Baichuan model demonstrated the best overall performance in extracting key elements from police reports. Using the data augmentation method to double the data size resulted in an additional 4% increase in the F1 score, achieving optimal model performance. Compared to the fine-tuned universal information extraction (UIE) large language model, the police report entity extraction model constructed in this study improved the F1 score for each element by approximately 5%, with a 42% improvement in the F1 score for the “organization” element. Finally, ChatGPT was employed to align the extracted entities, resulting in a high-quality entity extraction outcome.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.