Abstract

Named entity recognition (NER) is a fundamental and important task in natural language processing area, which jointly predicts entity boundaries and pre-defined categories. For Chinese NER task, recognition of long entities has not been well addressed yet. When character sequences of entities become longer, Chinese NER becomes more difficult with existing character-based and word-based neural methods. In this paper, we investigate Chinese NER methods that operate on subword units and propose to recognize Chinese long entities based on subword encoding. Firstly, our method generates subword units on known entities, which prevents noisy information brought by Chinese word segmentation and eases the determination of long entity boundaries. Then subword-character mixed sequences of sentences are served as input into character-based neural methods to perform Chinese NER. We apply our method on iterated dilated convolutional neural networks (ID-CNNs) and conditional random fields (CRF) for entity recognition. Experimental results on the benchmark People’s Daily and Weibo datasets show that our subword-based method achieves significant performance on long entity recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call