Abstract
This research paper focuses on developing a Named Entity Recognition (NER) system for a low-resource language, namely Azerbaijani. The paper develops NER models with two different approaches which are rule-based and machine learning-based approaches and compares the performances of them with familiar and unfamiliar datasets to determine the best approach. The rule-based approach uses statistics as its main technique and brings sufficient results - 70% f-score for both datasets. The second method consists of three models. The first one is obtained by training a model from scratch with Convolution Neural Network (CNN) using Spacy library which results in the best outcome - above 90% f-score for each test dataset. Secondly, a pre-trained multilingual Spacy model is also used to contrast the results, which proves the importance of the domain in which a NER model is trained since this model scored less than 50% in testing. Additionally, a new model has also been trained on top of the multilingual model using training dataset and performs the best in its domain.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.