Abstract

This paper proposes a novel model for named entity recognition of Chinese crop diseases and pests. The model is intended to solve the problems of uneven entity distribution, incomplete recognition of complex terms, and unclear entity boundaries. First, a robustly optimized BERT pre-training approach-whole word masking (RoBERTa-wwm) model is used to extract diseases and pests’ text semantics, acquiring dynamic word vectors to solve the problem of incomplete word recognition. Adversarial training is then introduced to address unclear boundaries of diseases and pest entities and to improve the generalization ability of models in an effective manner. The context features are obtained by the bi-directional gated recurrent unit (BiGRU) neural network. Finally, the optimal tag sequence is obtained by conditional random fields (CRF) decoding. A focal loss function is introduced to optimize conditional random fields (CRF) and thus solve the problem of unbalanced label classification in the sequence. The experimental results show that the model’s precision, recall, and F1 values on the crop diseases and pests corpus reached 89.23%, 90.90%, and 90.04%, respectively, demonstrating effectiveness at improving the accuracy of named entity recognition for Chinese crop diseases and pests. The named entity recognition model proposed in this study can provide a high-quality technical basis for downstream tasks such as crop diseases and pests knowledge graphs and question-answering systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call