Abstract

High quality agricultural named entity recognition (NER) model can provide effective support for agricultural information extraction, semantic retrieval and other tasks. However, the existing models ignore the potential characteristics of Chinese characters, resulting in the lack of internal semantics. Moreover, the agricultural text sequence is long, which leads to the lack of long-distance dependence of model capture. In order to solve the above problems, a self-attention mechanism RSA-CANER agricultural named entity recognition model is proposed which incorporating the potential characteristics of Chinese characters. First of all, the model takes character features and potential features of Chinese characters as input to enrich semantic information. Among them, character features are obtained based on ALBERT pre training tool, radical features are extracted based on convolutional neural network (CNN), and stroke features are extracted based on bidirectional long short-term memory model (BiLSTM). Then, based on the BiLSTM, the sequence characteristic matrix is obtained, and the self-attention mechanism is used to further enhance the ability of the model to capture long-distance dependence. Finally, the global optimal sequence is generated based on conditional random field (CRF) model. It obtains an F1-score of 95.56. The experimental results show that the model learns semantic information at multiple fine-grained levels of radicals and strokes, enriches the vector expression of target words, and its recognition accuracy is better than other models, improving the generalization ability of the model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call