Fusion of ALBERT and BiGRU-Attention-CRF Models for Named Entity Recognition in Lakes

刘 润,李 志强,朱 道恒

doi:10.57237/j.se.2022.01.003

Abstract

There are many lakes in China and they are widely distributed, so it is important to have comprehensive information on lakes to promote the implementation of major water network projects, ecological and environmental restoration of rivers and lakes, and smart water conservancy construction. This paper uses a deep learning model to identify and extract lake entities, and provides a reference for mining effective lake information from lake text. Firstly, the Named Entity Recognition (NER) method and annotation specification are customized, and a corpus of lake texts is established. Secondly, considering the small size of the self-built corpus and the sparsity of text information may affect the performance of the model and the recognition effect, a lightweight pre-training model (ALBERT) is introduced to generate high-quality word vectors as a two-way gating cycle The unit-conditional random field (BiGRU-CRF) model is used to generate the input layer feature vectors, and the attention mechanism is incorporated into the model to add feature weights to the semantic information of the entities to improve the entity feature extraction effect. Finally, a large number of validation experiments are completed, and the recognition effects of the four deep learning models are compared. The results show that the ALBERT-BiGRU-Attention-CRF model has good recognition effect on the dataset as a whole, with accuracy, recall and F1 reaching 91.26%, 90.38% and 90.81%, respectively. In addition, the model outperforms the other comparison models for the recognition of all four types of entities that occur at high frequencies.

Full Text