Parts of Speech Tagging for Afaan Oromo

Getachew Mamo,Million Meshesha

doi:10.14569/specialissue.2011.010301

Abstract

The main aim of this study is to develop part-of-speech tagger for Afaan Oromo language. After reviewing literatures on Afaan Oromo grammars and identifying tagset and word categories, the study adopted Hidden Markov Model (HMM) approach and has implemented unigram and bigram models of Viterbi algorithm. Unigram model is used to understand word ambiguity in the language, while bigram model is used to undertake contextual analysis of words. For training and testing purpose 159 sentences (with a total of 1621 words) that are manually annotated sample corpus are used. The corpus is collected from different public Afaan Oromo newspapers and bulletins to make the sample corpus balanced. A database of lexical probabilities and transitional probabilities are developed from the annotated corpus. These two probabilities are from which the tagger learn and tag sequence of words in sentences. The performance of the prototype, Afaan Oromo tagger is tested using tenfold cross validation mechanism. The result shows that in both unigram and bigram models 87.58% and 91.97% accuracy is obtained, respectively.

Highlights

At the heart of any natural language processing (NLP) task, there is the issue of natural language understanding
As explained in [1], natural languages give rise to lexical ambiguity that words may have different meanings, i.e. one word is in general connected with different readings in the lexicon
On Amharic language, two researches were conducted on POS tagging by [5] and [11], but to the best of our knowledge there is no POS tagging research conducted for Afaan Oromo language

Summary

INTRODUCTION

At the heart of any natural language processing (NLP) task, there is the issue of natural language understanding. In the above particular context suffixes are added to show gender {–t, --ta}, number { –tu/--u} and future {--fi} To handle such complexities and use computers to understand and manipulate natural language text and speech, there are various research attempts under investigation. Some of these include machine translation, information extraction and retrieval using natural language, text to speech synthesis, automatic written text recognition, grammar checking, and part-of-speech tagging. Most of these approaches have been developed for popular languages like English [3]. The study presents the investigation of designing and developing an automatic part-of-speech tagger for Afaan Oromo language

PART-OF-SPEECH TAGGING

Rule based Approach

Stochastic Approach

AFAAN OROMO

RELATED RESEARCHES

APPLICAION OF THE STUDY

Algorithm Design and Implementation

Test and Evaluation

Afaan Oromo Tagsets

Corpus

Lexicon probability

Findings

Performance Analysis of the tagger

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2011
Citations: 10	License type: cc-by

R Discovery Prime

R Discovery Prime

Parts of Speech Tagging for Afaan Oromo

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

Author response: An oscillating computational model can track pseudo-rhythmic speech by using linguistic predictions
Sanne ten Oever ... Andrea E Martin
-
Sanne ten Oever, et. al.Sanne ten Oever ... Andrea E Martin
21 Jun 2021
21 Jun 2021

Integrating different acoustic and syntactic language models in a continuous speech recognition system
Amparo Varona ... In Torres
-
Amparo Varona, et. al.Amparo Varona ... In Torres
16 Oct 2000
16 Oct 2000

GFCC based discriminatively trained noise robust continuous ASR system for Hindi language
Mohit Dua ... Mantosh Biswas
Journal of Ambient Intelligence and Humanized Computing | VOL. 10
Mohit Dua, et. al.Mohit Dua ... Mantosh Biswas
07 May 2018
Journal of Ambient Intelligence and Humanized Computing | VOL. 10

Modelo Acústico y de Lenguaje del Idioma Español para el dialecto Cucuteño, Orientado al Reconocimiento Automático del Habla
Juan David Celis Nuñez ... Byron Medina Delgado
Ingeniería | VOL. 22
Juan David Celis Nuñez, et. al.Juan David Celis Nuñez ... Byron Medina Delgado
12 Sep 2017
Ingeniería | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parts of Speech Tagging for Afaan Oromo

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications