Abstract

Named-entity Recognition (NER) is challenging for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER Model. We propose a customized model based on linguistic properties to compensate for this lack of resources in low-resource languages like Persian. According to pronoun-dropping and subject-object-verb word order specifications of Persian, we propose new weighted relative positional encoding in the self-attention mechanism. Using the pointwise mutual information factor, we inject co-occurrence information into context representation. We trained and tested our model on three different datasets: Arman, Peyma, and ParsTwiNER, and our method achieved 94.16%, 93.36%, and 84.49% word-level F1 scores, respectively. The experiments showed that our proposed model performs better than other Persian NER models. Ablation Study and Case Study also showed that our method can converge faster and is less prone to overfitting.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call