Abstract

Compared with English named entity recognition (NER), Chinese NER faces significant challenges due to the flexible, non-standard word formation and vague word boundaries, which cause a lot of boundary ambiguity and reduce the accuracy of entity identification. To address this issue, we propose a boundary enhancement with multi-class information model (BEMCI). The model integrates multiple types of information into text embedding while enhancing the subsequent syntax-structure information. A syntactic information analysis module is designed to highlight important syntax information from three aspects, namely part-of-speech tags, syntactic constituents, and dependency relations, to analyze sentence structures. Meanwhile, an improved contextual attention mechanism, which combines contextual and syntactic information using a gate mechanism to control the weight fusion, is proposed to further enhance the model’s boundary determination. Multiple sets of experiments conducted on six general datasets show that BEMCI outperforms other baselines, achieving the best results in four of these six datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.