Named Entity Recognition with CRF Based on ALBERT: A Natural Language Processing Model

Kezhou Ren,Hongxu Li,Yifan Zeng,Yingchao Zhang

doi:10.1007/978-981-19-6052-9_45

Abstract

AbstractNamed entity recognition has a variety of applications in journalism, where it may extract relevant information from voluminous daily news reports. However, its applicability is limited since there is no word vector learning and existing models are complicated. This paper is based on the lightweight Natural Language Processing model (ALBERT) dynamic word vector generation model proposed by Google. The model was combined with Bidirectional Long Short-Term Memory Network (LSTM) and Conditional Random Field (CRF) to form the ALBERT-BiLSTM-CRF model. This paper applies the ALBERT-BiLSTM-CRF model with the 2014 edition of the People’s Daily published on the Internet as the primary data set to compare the traditional statistical model and the classic NLP model. The experimental results show that the ALBERT-BiLSTM-CRF has a comparative advantage over the classic natural language processing (NLP) model. The proposed model can increase the recognition accuracy and recall rate of named entities in the news. The model's accuracy and recall on the test dataset attained 94.49 and 89.50 percent, respectively, while the model's volume allows for lightweight deployment.KeywordsChinese named entity recognitionConditional random fieldALBERT modelDynamic word vector

Full Text