ANEC: An Amharic Named Entity Corpus and Transformer Based Recognizer

Ebrahim Chekol Jibril,A. Cuneyd Tantug

doi:10.1109/access.2023.3243468

Ebrahim Chekol Jibril, A. Cuneyd Tantug

Open Access

https://doi.org/10.1109/access.2023.3243468

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2023
Citations: 3	License type: CC BY 4.0

Affiliation: Istanbul Technical University

Abstract

Named Entity Recognition is an information extraction task that serves as a pre-processing step for other natural language processing tasks, such as machine translation, information retrieval, and question answering. Named entity recognition enables the identification of proper names as well as temporal and numeric expressions in an open domain text. For Semitic languages such as Arabic, Amharic, and Hebrew, the named entity recognition task is more challenging due to the heavily inflected structure of these languages. In this study, we annotate a new comparatively large Amharic named entity recognition dataset and make it publicly available. Using this new dataset, we build multiple Amharic named entity recognition systems based on recent deep learning approaches including transfer learning (RoBERTa), and bidirectional long short-term memory coupled with a conditional random fields layer. By applying the Synthetic Minority Over-sampling Technique to mitigate the imbalanced classification problem, our best performing RoBERTa based named entity recognition system achieves an f1-score of 93%, which is the new state-of-the-art result for Amharic named entity recognition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ANEC: An Amharic Named Entity Corpus and Transformer Based Recognizer

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Improving Named Entity Recognition using Bilingual Constraints and Word Alignment
An T Dao ... Long Nguyen
IOP Conference Series: Materials Science and Engineering | VOL. 435
An T Dao, et. al.An T Dao ... Long Nguyen
01 Oct 2018
IOP Conference Series: Materials Science and Engineering | VOL. 435

ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy
Yassine Benajiba ... José Miguel Benedíruiz
-
Yassine Benajiba, et. al.Yassine Benajiba ... José Miguel Benedíruiz
01 Jan 2007
01 Jan 2007

Arabic Named Entity Recognition Using Artificial Neural Network
Chapram
Journal of Computer Science | VOL. 8
Chapram Chapram
01 Aug 2012
Journal of Computer Science | VOL. 8

Deep Learning Architectures for Named Entity Recognition: A Survey
Anu Thomas ... S Sangeetha
-
Anu Thomas, et. al.Anu Thomas ... S Sangeetha
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ANEC: An Amharic Named Entity Corpus and Transformer Based Recognizer

Abstract

Talk to us

Similar Papers

More From: IEEE Access