CWAI-CNER: Chinese entity recognition based on adaptive incorporation of characters and words

Pai Peng,Xu Wu,Jingchen Wu,Xiaqing Xie

doi:10.1109/iccece51280.2021.9342310

Abstract

Chinese Named Entity Recognition (CNER) is an important sub topic in the field of Chinese Natural Language Processing, which plays an important role in multi tasks. However, it's difficult to determine the boundaries of entities in Chinese texts because the Chinese words are not naturally separated, which further causes the task of CNER much more difficult. In addition, the mainstream Named Entity Recognition (NER) is based on sequence tagging, which causes the cost of training set labeling very high, so many NER tasks are limited by training sets' deficiency. In this work, we propose a new CNER method based on adaptive incorporation of characters and words-CWAI to solve the problem of words information loss caused by lacking of words boundaries, which uses convolution neural network (CNN) to capture the local semantics for every character, and then adaptively calculates the weights of potential words that match a lexicon for each character based on attention mechanism between characters and words. And for the problem of limited model effects due to insufficient training set, we combined our model with pre-trained models to solve that.

Full Text