A method of named entity recognition for Tigrinya

Hailemariam Mehari Yohannes,Toshiyuki Amagasa

doi:10.1145/3570733.3570737

Abstract

This paper proposes a method for Named-Entity Recognition (NER) for a low-resource language, Tigrinya, using a pre-trained language model. Tigrinya is a morphologically rich, although one of the underrepresented in the field of NLP. This is mainly due to the limited amount of annotated data available. To address this problem, we present the first publicly available datasets of NER for Tigrinya containing two versions, namely, (V1 and V2) annotated manually. The V1 and V2 datasets contain 69,309 and 40,627 tokens, respectively, where the annotations are based on the CoNLL 2003 Beginning, Inside, and Outside (BIO) tagging schema. Specifically, we develop a new pre-trained language model for Tigrinya based on RoBERTa, which we refer to as TigRoBERTa. Our model is then fine-tuned on down-stream tasks on a more specific target NER and POS tasks with limited data. Finally, we further enhance the model performance by applying semi-supervised self-training using unlabeled data. The experimental results show that the method achieved 84% F1-score for NER and 92% accuracy for POS tagging, which is better than or comparable to the baseline method based on the CNN-BiLSTM-CRF.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A method of named entity recognition for Tigrinya

Abstract

Talk to us

Similar Papers

More From: ACM SIGAPP Applied Computing Review

Lead the way for us

Journal: ACM SIGAPP Applied Computing Review	Publication Date: Sep 1, 2022
Citations: 4

Similar Papers

Transformer-based Named Entity Recognition for Clinical Cancer Drug Toxicity by Positive-unlabeled Learning and KL Regularizers
Weixin Xie ... Chengkui Zhao
Current Bioinformatics | VOL. 19
Weixin Xie, et. al.Weixin Xie ... Chengkui Zhao
01 Sep 2024
Current Bioinformatics | VOL. 19

Chinese named entity recognition method for the finance domain based on enhanced features and pretrained language models
Han Zhang ... Lixia Ji
Information Sciences | VOL. 625
Han Zhang, et. al.Han Zhang ... Lixia Ji
29 Dec 2022
Information Sciences | VOL. 625

MenuNER: Domain-Adapted BERT Based NER Approach for a Domain with Limited Dataset and Its Application to Food Menu Domain
Muzamil Hussain Syed ... Sun-Tae Chung
Applied Sciences | VOL. 11
Muzamil Hussain Syed, et. al.Muzamil Hussain Syed ... Sun-Tae Chung
28 Jun 2021
Applied Sciences | VOL. 11

A Technique to Pre-trained Neural Network Language Model Customization to Software Development Domain
Pavel V Dudarin ... Vadim G Tronin
-
Pavel V Dudarin, et. al.Pavel V Dudarin ... Vadim G Tronin
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A method of named entity recognition for Tigrinya

Abstract

Talk to us

Similar Papers

More From: ACM SIGAPP Applied Computing Review