Information Extraction Model for Afan Oromo News Text

Sisay Abera,Tesfa Tegegne

doi:10.1007/978-3-030-26630-1_28

Abstract

Information Extraction (IE) concerned with the automatic extraction of facts from text and stores them in a database for easy use and management of the data. As the first research work on IE from Afan Oromo text, we designed a model that deals with Infrastructure news domains in the Oromo language. The proposed model has document preprocessing, learning and extraction and post processing as its main components. In this work recall, precision and F-measure are used as evaluation metrics for Afan Oromo Text Information Extraction (AOTIE). Being trained and tested for the dataset of size 3169 tokens, AOTIE performed 79.5% precision, 80.5% recall and 80% F-measure. These results are used as a baseline to experiment on AOTIE. We set up two main experimentation scenarios to experiment on AOTIE. The first scenario is conducted by developing a gazetteer. The second scenario is aimed at observing the influence of Afan Oromo grammatical structure. Both scenarios showed that, the performance of AOTIE is mostly dependent on grammatical structure of Afan Oromo.

Full Text