Arabic NER Evaluation: Pre-Trained Models via Contrastive Learning vs. LLM Few-Shot Prompting

Passant Elchafei,Amany Fashwan

doi:10.1016/j.procs.2024.10.196

Abstract

Developing Natural Language Processing (NLP) tools for the Arabic language and its dialects is very challenging. Named Entity Recognition (NER) is one of these challenges, which serves as the core component in many NLP systems such as information extraction, question answering, machine translation and knowledge graph building. This paper sheds light on applying diferent approaches for Arabic NER (Flat and Nested) using a large and rich Arabic NER corpus, Wojood dataset, which consists of about 550K tokens annotated with 21 entity types. First, we apply the Wojood base model, AraBERTv2, along with various other Arabic BERT models such as MARBERTv2, CaMelBert, mBert, ..etc. Next, we utilize the Bi-Encoder Contrastive Learning (CL) approach, a framework developed by Microsoft, which maps candidate text spans and entity types into the same vector representation space. The primary challenge in this approach is distinguishing non-entity spans from entity mentions. This approach could achieve F1 score 91.25% for Flat and 91.40% for Nested NER. Additionally, for evaluating the predicted NER, we employ Few-Shot prompting on LLaMA, and GPT-3.5 using refined prompt-based strategy. Our findings reveal that LLaMA outperforms GPT3.5.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Arabic NER Evaluation: Pre-Trained Models via Contrastive Learning vs. LLM Few-Shot Prompting

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science

Lead the way for us

Similar Papers

A Survey of Arabic Named Entity Recognition and Classification
Khaled Shaalan
Computational Linguistics | VOL. 40
Khaled ShaalanKhaled Shaalan
01 Jun 2014
Computational Linguistics | VOL. 40

A Comparative Review of Machine Learning for Arabic Named Entity Recognition
Ramzi Esmail Salah ... Lailatul Qadri Binti Zakaria
International Journal on Advanced Science, Engineering and Information Technology | VOL. 7
Ramzi Esmail Salah, et. al.Ramzi Esmail Salah ... Lailatul Qadri Binti Zakaria
16 Apr 2017
International Journal on Advanced Science, Engineering and Information Technology | VOL. 7

Evaluation of Natural Language Processing (NLP) systems to annotate drug product labeling with MedDRA terminology
Thomas Ly ... Robert Ball
Journal of Biomedical Informatics | VOL. 83
Thomas Ly, et. al.Thomas Ly ... Robert Ball
01 Jun 2018
Journal of Biomedical Informatics | VOL. 83

Arabic Named Entity Recognition for Crime Documents Using Classifiers Combination
Suhad Abdulzahra Hachim Al-Shoukry ... Nazlia Omar
International Review on Computers and Software (IRECOS) | VOL. 10
Suhad Abdulzahra Hachim Al-Shoukry, et. al.Suhad Abdulzahra Hachim Al-Shoukry ... Nazlia Omar
30 Jun 2015
International Review on Computers and Software (IRECOS) | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Arabic NER Evaluation: Pre-Trained Models via Contrastive Learning vs. LLM Few-Shot Prompting

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science