Chinese POS tagging based on maximum entropy model

Jian Zhao Jian Zhao,Xiao-Long Wang Xiao-Long Wang

doi:10.1109/icmlc.2002.1174406

Abstract

The POS (part of speech) tagging is the basic work in natural language processing. The tagging precision will have an important effect on the result of latter process, such as syntax analysis. In this paper, a Chinese POS tagger based on the maximum entropy model is presented, which trains from a large corpus annotated with Chinese POS tags and assigns the best tag sequence to the Chinese sentence to be annotated. In this model, all the features that are useful to predicate the POS tags are mined to make the model closer to the real case. In addition, for the problem of overfitting, a smoothing method and a POS dictionary are maintained to reduce the model's dependence to training data and improve the efficiency of the search process. Open test results shows that the Chinese POS tagging with this method can achieve an accuracy of 96.8%.

Full Text