Compared with English named entity recognition (NER), Chinese NER faces significant challenges due to the flexible, non-standard word formation and vague word boundaries, which cause a lot of boundary ambiguity and reduce the accuracy of entity identification. To address this issue, we propose a boundary enhancement with multi-class information model (BEMCI). The model integrates multiple types of information into text embedding while enhancing the subsequent syntax-structure information. A syntactic information analysis module is designed to highlight important syntax information from three aspects, namely part-of-speech tags, syntactic constituents, and dependency relations, to analyze sentence structures. Meanwhile, an improved contextual attention mechanism, which combines contextual and syntactic information using a gate mechanism to control the weight fusion, is proposed to further enhance the model’s boundary determination. Multiple sets of experiments conducted on six general datasets show that BEMCI outperforms other baselines, achieving the best results in four of these six datasets.
Read full abstract