Abstract

Objectives: This study aims about the development of Anyuak language named entity recognition of its first kind. NER is a fundamental sub task in natural language processing and the high accuracy competence in NER system marks the effectiveness of the downstream tasks. Anyuak language named entity recognition concern is addressed by using a long short-term memory model to categorize tokens into predefined classes. Methods: A long short-term memory is used to model the NER for Anyuak language to detect and classify words into five predefined classes: Person, Time, Organization, Location, and Others (non-named entity words). Because of feature selection plays a vital role in long short-term memory framework, the experiment in this work were conducted to discover most suitable features for Anyuak NER tagging task. Findings: When we evaluated the experiment in cross-validation, we achieved a promising result of precision, recall, and F1-measure values of 98%, 90, and 94% respectively. From the experimental result, it is possible to determine that tag context, word features, part of speech tags, suffixes and prefixes are significant features in named entity recognition and classification for Anyuak language. Novelty: Finally we have contributed a new architecture for Anyuak NER which uses automatically features for Anyuak named entity recognition which are not dependent on other NLP tasks. We proved that deep learning models can be extended, trained and can work for Anuak languages. Keywords: Named entity recognition in Anyuak; Recurrent neural network; long shortterm memory; Natural language processing; and deep learning

Highlights

  • Anyuak commonly spelled as “Anywa” is a language that has its place in the Western Nilotic division language of the Nilotic language family

  • Named entity recognition is a sub-task of natural language processing in identifying and classifying named entities in a text document, which is a key component in NLP systems, especially in information retrieval, machine translation, automatic document summarization, and question-answering[6]

  • To the best of our knowledge, we tried to develop the Anyuak language NER system which is the first work in the language using long short term memory techniques which can automatically extract features from words and sentences which is efficient in sequential labelling tasks in named entity recognition tasks

Read more

Summary

Introduction

Anyuak commonly spelled as “Anywa” is a language that has its place in the Western Nilotic division language of the Nilotic language family. In Ethiopia “Nilotic” denotes the Nilo-Saharan languages and their populations It is spoken in the South West of Ethiopia in Gambella regional state and adjacent border areas of South East Sudan by the Anyuak community [1,2]. Named entity recognition is a sub-task of natural language processing in identifying and classifying named entities in a text document, which is a key component in NLP systems, especially in information retrieval, machine translation, automatic document summarization, and question-answering[6] These named entities are predefined and denote words or word phrases such as organization names, location names, time names, and person names[7,8]. Detecting named entities in a given text corpus is a significant step towards knowing and processing a document especially in the domain of formal documents and news reports [10]

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call