Abstract

Fusing lexicon information into Chinese characters, which has normally a number of meanings, has been proven to be effective for Chinese Named Entity Recognition (NER). However, the existing approaches to incorporating a matched Chinese word into its composition characters only take the word as a whole (no subdivision or part), which failed to capture fine-grained correlation in word-character space and failed to make full use of lexicon information. Moreover, existing approaches use the fixed (static) weights between words and characters. This limits the performance of NER. Considering the fact that the same word-character pairs have different interactions in different contexts, the weights of matched word-character pairs should be dynamic rather than fixed. In this paper, we propose a Polymorphic Graph Attention Network (PGAT), aiming at capturing dynamic correlation between characters and matched words from multiple dimensions, to enhance the character representation. By obtaining matched words of characters from lexicon, we carefully map the word-character in four positions, which are B (begin), M (middle), E (end) and S (single word). The proposed semantic fusion unit based on Graph Attention Network (GAT) can dynamically modulate attention of matched words and characters in the four dimensions B, M, E, and S. Thus, it can explicitly capture fine-grained correlation between characters and matched words across each dimension. Experiments on four Chinese NER datasets show that PGAT outperforms the baseline models. It demonstrates the significance of the attention capture and fusion capabilities of the proposed polymorphic graph. Furthermore, PGAT is used in character representation layer, which makes it easier to be combined with pre-trained models like BERT and other sequence encoding models like CNN and Transformer.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.