Automatic Part-of-speech Tagging for Oromo Language Using Maximum Entropy Markov Model (MEMM)

Abraham Tesso Nedjo

doi:10.12733/jics20103906

Abstract

The problem of Part-of-speech tagging (POS tagging) for natural language processing task or computational linguistics is inevitable for every natural language of mankind. In this paper, we present experimental results on one of the state-of-the-art probabilistic model for sequence classification, Maximum Entropy Markov Model (MEMM), for tagging Oromo language. This model assigns the correct part-of-speech tag to each word or token of the sentence, considering many features and contexts. We used a MEMM and it was found to be the best way to estimate word classes of Oromo text. To implement the model, experiments were conducted on a manually annotated corpus of 452 sentences (total of 6094 words) of Oromo language. Experimental results show that the new algorithm performs well with accuracy of 93.01% evaluated by tenfold cross validation. By the result of this paper it can be generalized that this modelling technique, MEMM, has shown some advantages over Hidden Markov Models for sequence tagging since it offers increased freedom in choosing features to represent observations for POS tagging of oromo language.

Full Text