Abstract

The POS (part of speech) tagging is the basic work in natural language processing. The tagging precision will have an important effect on the result of latter process, such as syntax analysis. In this paper, a Chinese POS tagger based on the maximum entropy model is presented, which trains from a large corpus annotated with Chinese POS tags and assigns the best tag sequence to the Chinese sentence to be annotated. In this model, all the features that are useful to predicate the POS tags are mined to make the model closer to the real case. In addition, for the problem of overfitting, a smoothing method and a POS dictionary are maintained to reduce the model's dependence to training data and improve the efficiency of the search process. Open test results shows that the Chinese POS tagging with this method can achieve an accuracy of 96.8%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.