Abstract
Named entity recognition (NER) is a fundamental and important task in natural language processing area, which jointly predicts entity boundaries and pre-defined categories. For Chinese NER task, recognition of long entities has not been well addressed yet. When character sequences of entities become longer, Chinese NER becomes more difficult with existing character-based and word-based neural methods. In this paper, we investigate Chinese NER methods that operate on subword units and propose to recognize Chinese long entities based on subword encoding. Firstly, our method generates subword units on known entities, which prevents noisy information brought by Chinese word segmentation and eases the determination of long entity boundaries. Then subword-character mixed sequences of sentences are served as input into character-based neural methods to perform Chinese NER. We apply our method on iterated dilated convolutional neural networks (ID-CNNs) and conditional random fields (CRF) for entity recognition. Experimental results on the benchmark People’s Daily and Weibo datasets show that our subword-based method achieves significant performance on long entity recognition.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.