Abstract

Among many named entity recognition modes in natural languages, most of the processing in the text preprocessing stage only pays attention to the vector representation of single words and characters, and seldom pays attention to the semantic relationship in the text. In the language text information, there are many pronouns and polysemous words, which makes the problem of polysemous words appear in the text preprocessing stage. Based on this problem, this paper adopts a Chinese named entity recognition method based on the BERT-Transformer-BiLSTM-CRF model. First, use the pre-trained BERT model in a large-scale corpus to dynamically generate a sequence of word vectors according to its input context, then use the Transformer encoder to model the contextual long-distance semantic features of the text, and use the BiLSTM model to perform sentence context features Extract, and finally input the feature vector sequence into CRF (Conditional Random Field) to get the final prediction result. Tested on the public MSRA Chinese corpus. Experimental results on the corpus show that the model has improved accuracy, recall and F1 value than most models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call