Abstract

This research presents design, experiment and development of longest-match based Stemmer for Wolaita texts. The objective of this paper is to conflate the variants of Wolaita text words into its stem with better accuracy, using Longest-Match based approach. To help the researcher how to compile the possible combination of suffixes, the deep analysis of Wolaita word morphology has been made. For data preprocess and implementation, C# programming language is used. After preprocessing, 12789 unique words are reserved to experiment this research. Out of these unique words, 1200 words are randomly selected earlier and kept separate for testing purpose. Then the developed stemmer was tested using Paice’s actual error counting method. The output on that test dataset has showed 91.84% accuracy over actual manually stemmed words. The obtained result shows that the rule based longest match approach is promising for stemming Wolaita language texts.

Highlights

  • Omotic languages are a group of close to 30 languages which are spoken in the south west of Ethiopia around the Omo river

  • Even though the Longest-match approach requires the compilation of all possible combinations of suffixes; it has less computational complexity because the arrangement of suffixes in suffix list are in their decreasing order of length and has less time complexity because it involves in single pass of the suffix match

  • Stemming is important for highly inflected languages like Wolaita for many applications that require the stem of a word

Read more

Summary

Introduction

Omotic languages are a group of close to 30 languages which are spoken in the south west of Ethiopia around the Omo river. The 28 Omotic languages are classified into northern and southern sub-families [1]. Wolaita language is one of the Northern Omotic languages that is spoken in the Wolaita Zone and some other parts of the Southern Nations, Nationalities, and People's Region of Ethiopia. The Latin script is being used since 1993 to write Wolaita texts [2]. The publications of textbooks and other reference materials like literatures, newspapers, and magazines have been increasing over the year; and a significant number of people are able to read and write Wolaita scripts. The language is serving as a medium of instruction in primary school and is offered as a subject in secondary school, and a program in Wolaita Sodo University

Objectives
Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.