Abstract

Parser is an efficient and accurate enough to be useful in many natural language processing systems, most notably in machine translation [1]. Previously many sentence parsers are developed for foreign languages such as English, Arabic, etc. as well as for Amharic language from local languages of Ethiopia. However, to the best of the researcher’s knowledge concerned, there is no Afan Oromo sentence parser for simple and complex sentences. Thus, we proposed to develop a sentence parser for Afan Oromo language. Parsing Afan Oromo sentence is needed and a necessary mechanism for other natural language processing applications like machine translation, question answering, knowledge extraction and information retrieval, particularly for Afan Oromo language. Rule-based parser using a top-down chart parsing algorithm for Afan Oromo sentences presented in this paper. Context Free Grammar (CFG) is used to represent the grammar. 500 sentences were prepared for sample corpus and CFG rules are extracted manually from sample tagged corpus. We also developed simple algorithm of a lexicon generator to automatically generate the lexical rules. Python programming language and NLTK are used as an implementation tools for this study. From the total of sample dataset 70% is simple sentence type because of we considered four different types of simple sentences (declaratives, interrogatives, imperatives and exclamatory sentences) and the rest 30% is complex sentence type. The parser was trained on 400 sentences of training dataset with the accuracy of 98.25% and tested on 100 sentences of testing dataset with the accuracy of 91%. The experimental results on a parser is an encouraging result since it is the first work for simple and complex sentences of Afan Oromo language.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call