Abstract

Part-of-speech (POS) tagging is a fundamental task of Natural Language Processing (NLP). It provides useful information for many other NLP tasks, including word sense disambiguation, text chunking, named entity recognition, syntactic parsing, semantic role labeling, and semantic parsing. In this paper, we have developed POS taggers for Amazigh language using Conditional Random Field (CRF), Support Vector Machine (SVM) and TreeTagger system. We have manually annotated approximately 85000 tokens, collected from the written texts with a POS tagset of 28 tags defined for the Amazigh language. The POS taggers make use of the different contextual and orthographic word features. These features are language independent and applicable to other languages also. POS taggers have been trained, and tested with the same Amazigh corpus. Evaluation results demonstrated the accuracies of 90.08%, 89.38% and 92.06% in the CRF, SVM and TreeTagger, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call