Abstract

In this paper, we explore the application of the Bidirectional Encoder Representations from Transformers (BERT) model for Named Entity Recognition (NER) in the Albanian language. Despite the success of BERT in various NLP tasks across numerous languages, Albanian remains under-represented. Our approach leverages a transfer learning strategy with multilingual BERT, fine-tuned on a newly created, comprehensive Albanian language corpus. The corpus includes texts from various domains, thereby facilitating the identification of a broad range of named entities. We report on the challenges faced during corpus creation, model fine-tuning, and testing phases, such as dealing with dialectal variations and lack of existing resources for the Albanian language. This research not only contributes to the advancement of NER systems for low-resource languages, but also provides a robust foundation for further advancements in Albanian language processing.
 
 Received: 21 September 2023 / Accepted: 29 October 2023 / Published: 23 November 2023

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call