Abstract
• Contextual embeddings and glyph feature-based agricultural NER model was proposed. • A 3D CNN-based model was proposed to capture the contextual morphological features. • A weighted method was proposed to extract more important local context features. • Experiments showed that it could improve the ability to recognize rare entities. In recent years, deep learning has greatly improved the performance of named entity recognition models in various fields, especially in the agricultural domain. However, most existing works only utilize word embedding models to generate the context-independent embeddings, which is limited in modeling polysemous words. Moreover, the abundant morphological information in agricultural texts has not been fully utilized. Besides, the local context information needs to be further extracted. To solve the aforementioned issues, a novel enhanced contextual embeddings and glyph features-based model was proposed. First, the contextual embeddings were dynamically generated by the fine-tuned Bidirectional Encoder Representation from Transformers (BERT) on the domain-specific corpus (e.g., agricultural texts), and then the multi-granularity information was obtained from the layers of BERT. Thus, the contextual embeddings not only contain domain-specific knowledge but also include multi-grained semantic information. Second, a novel 3-dimension convolutional neural network-based framework was designed to capture the contextual glyph features for each character from the image perspective. Third, a channel-wised fusion architecture was also introduced to further improve the ability of the convolutional neural network layer to capture local context features. Experimental results showed that our proposed model achieved the best F 1 -scores of 95.02% and 96.51% on AgCNER and Resume datasets, which indicated the effectiveness and generalization of our model to identify the entities in cross-domain texts. The ablation study in many aspects also demonstrated the better performance of the proposed model.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have