Using Uniform-Design GEP for Part-of-Speech Tagging

Chengyao Lv,Yuan Liang,Fangyuan Li,Yuanxing Dong,Huihua Liu

doi:10.1142/s0218126617500608

Abstract

In natural language processing (NLP), a crucial subsystem in a wide range of applications is a part-of-speech (POS) tagger, which labels (or classifies) unannotated words of natural language with POS labels corresponding to categories such as noun, verb or adjective. This paper proposes a model of uniform-design genetic expression programming (UGEP) for POS tagging. UGEP is used to search for appropriate structures in function space of POS tagging problems. After the evolution of sequence of tags, GEP can find the best individual as solution. Experiments on Brown Corpus show that (1) in closed lexicon tests, UGEP model can get higher accuracy rate of 98.8% which is much better than genetic algorithm model, neural networks and hidden Markov model (HMM) model.; (2) in open lexicon tests, the proposed model can also achieve higher accuracy rate of 97.4% and a high accuracy rate on unknown words of 88.6%.

Full Text