Abstract

This paper proposes a new approach for personal name recognition in Chinese language domain. Combining rule-based and statistical method, we consider wonderful linguistics knowledge; firstly step, we collect personal name as candidate entity, and send it into statistical model to decide whether it is the relevant entity, the conditional random fields (CRFs) is used in this paper. At the same time, the dynamic priority method is proposed to solve the difficulty that the section of a foreign personal name would be recognized a Chinese personal name. Moreover, model including features as follows: probabilistic feature functions are used instead of binary feature functions, it is one of the several differences between this model and the most of the previous CRFs based model. We also explore several new features in our model, which includes confidence functions, context semantic and contextual surroundings. Like those in some previous works, we use sub-models to model Chinese personal names, Foreign personal names and abbreviation personal name respectively, but we bring some new techniques in these sub-models. Experimental results show our CRFs model combining above new elements brings significant improvements.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call