Part of Speech Tagging for Setswana African Language

S.O Ojo,P A Owolawi,M.A Dibitso

doi:10.1109/imitec45504.2019.9015871

Abstract

Part of speech (POS) tagging is the technique that assigns appropriate lexical categories to words in a sentence. It is a crucial step in Natural Language Processing (NLP) applications such as Machine Translation, Spell and Grammar checking, Word Predictions, Information Retrieval, etc‥ A lot of work has been done on POS tagging mainly for European and Asiatic languages, while in Africa, more work is needed mostly due to the lack of the annotated corpus. Some significant works have been done on African languages, such as Arabic, Igbo, Swahili and Yoruba, South African official languages. However, African languages are generally under-resourced, in particular, in terms of lexical semantics annotated corpora, necessary for effective NLP tools and applications. Hence, advances in this direction have been limited. The main aim of the work reported in this paper is the development of a POS tagger model for an under-resourced Setswana African language. A review of some POS taggers for different African languages is conducted, challenges and techniques used in creating the POS taggers are elicited, and a POS tagger model for Setswana language using SVMTool is presented.

Full Text