Chinese Named Entity Recognition Based on BERT and Lexicon Enhancement

Jingsheng Zhao,Mingyu Cui,Shuai Yan,Xiang Gao,Qihui Ni

doi:10.1145/3584376.3584482

Abstract

Named entity recognition is an important part of information extraction and knowledge graph construction, and is the basic work of natural language processing. Chinese named entity recognition mainly adopts word-based and character-based methods, word-based methods rely on word segmentation and common word segmentation methods have word segmentation errors, which easily cause error propagation, character-based methods avoid this error but do not make full use of lexicon information. The performance of Chinese named entity recognition can be effectively improved by introducing lexicon information into character-based named entity recognition. In this paper, we propose a BERT-IDCNN-CRF model combined with the SoftLexicon method. First, the BERT pre-training language model is used to train the character embedding vector, and the lexicon information is obtained by the SoftLexicon method. Then, the lexicon information is combined with the character vector representation obtained by training. Next, the fused vector representation is input to the IDCNN model for further training. Finally, the recognition results of Chinese named entities are obtained by the CRF model. The experimental results show that the F1 value can reach 95.95%, 70.63% and 95.28% on Resume, Weibo and MSRA datasets, and the training speed is faster than BERT-BiLSTM-CRF.

Full Text