Improving Chinese Named Entity Recognition by Large-Scale Syntactic Dependency Graph

Peng Zhu,Aoying Zhou,Yifeng Luo,Dingjiang Huang,Weining Qian,Fangzhou Yang,Dawei Cheng

doi:10.1109/taslp.2022.3153261

Abstract

Named entity recognition (NER) isa preliminary task in natural language processing (NLP). Recognizing Chinese named entities from unstructured texts is challenging due to the lack of word boundaries. Even if performing Chinese Word Segmentation (CWS) could help to determine word boundaries, it is still difficult to determine which words should be clustered together for entity identification, since entities are often composed of multiple-segmented words. As dependency relationships between segmented words could help to determine entity boundaries, it is crucial to employ information related to syntactic dependency relationships to improve NER performance. In this paper, we propose a novel NER model to learn information about syntactic dependency graphs with graph neural networks, and merge learned information into the classic Bidirectional Long Short-Term Memory (BiLSTM) - Conditional Random Field (CRF) NER scheme. In addition, we extract various kinds of task-specific hidden information from multiple CWS and part-of-speech (POS) tagging tasks, to further improve the NER model. We finally leverage multiple self-attention components to integrate multiple kinds of extracted information for named entity identification. Experimental results on three public benchmark datasets show that our model outperforms the state-of-the-art baselines in most scenarios.

Full Text