RoBERT-Agr: An Entity Relationship Extraction Model of Massive Agricultural Text Based on the RoBERTa and CRF Algorithm

Tianyue Chen,Xiang Li,Di Ouyang,Yongqiang Qian,Jingbo Zhao,Lan Huang,Yaojun Wang,Xiaojin Chen,Shihao Dong

doi:10.1109/icbda57405.2023.10105090

Abstract

Joint entity recognition and relation extraction are complex in natural language processing. It is essential in information extraction and can be applied to knowledge graph construction question-answering systems. Existing problems in the agricultural text processing field include low text utilization, harrowing entity recognition relationship extraction, and low accuracy. To improve the utilization rate of the agricultural text and implement joint entity recognition and relation extraction of agricultural texts, this study constructs the agricultural text entity-relationship dataset AgriRE by collecting existing agricultural texts from the Internet and defining rules for corpus annotation. The AgriRE dataset sets up the primary entity and six types of relationships: alias, damaged position, genus, family, distribution area, and damaged crops. The dataset contains 177454 data samples, includes 1798 agricultural entities, and 12789 agricultural relationships. Based on the AgriRE dataset, this study proposes a joint entity recognition and relation extraction model named RoBERT-Agr based on the combination of RoBERTa, WWM and CRF algorithms. The model is used to realize the mutual entity recognition and relation extraction. The experimental results show that the method based on the RoBERT-Agr model has the highest F1 score compared with the existing advanced models. The model’s classification accuracy can reach 96.18%, and the F1 score on the AgriRE test set is 95.72% by training, verifying, and testing the model.

Full Text