Abstract

Relation extraction is the task of extracting semantic relations between entities in a sentence. It is an essential part of some natural language processing tasks such as information extraction, knowledge extraction, question answering, and knowledge base population. The main motivations of this research stem from a lack of a dataset for relation extraction in the Persian language as well as the necessity of extracting knowledge from the growing big data in the Persian language for different applications. In this paper, we present “PERLEX” as the first Persian dataset for relation extraction, which is an expert-translated version of the “SemEval-2010-Task-8” dataset. Moreover, this paper addresses Persian relation extraction utilizing state-of-the-art language-agnostic algorithms. We employ six different models for relation extraction on the proposed bilingual dataset, including a non-neural model (as the baseline), three neural models, and two deep learning models fed by multilingual BERT contextual word representations. The experiments result in the maximum F1-score of 77.66% (provided by BERTEM-MTB method) as the state of the art of relation extraction in the Persian language.

Highlights

  • Relation extraction (RE) is the task of identifying semantic relations between text entities and is one of the most crucial tasks in natural language processing (NLP)

  • The models do not behave the same in the Instrument-Agency class, which means that the baseline model is better than all models except BERTEM-MTB. is is because the baseline model uses dependency relations and their direction between two consecutive entities for this purpose, while other models do not use this information

  • The relation extraction (RE) task in the Persian language is conducted for the first time

Read more

Summary

Introduction

Relation extraction (RE) is the task of identifying semantic relations between text entities and is one of the most crucial tasks in natural language processing (NLP). E RE task needs to be performed using the “Located In” predicate and “Tehran” as the relationship’s object to enable this information to be extracted. Another example of the application of RE is its application in question answering. E application of Bidirectional LSTM Networks with Entity-Aware Attention using the Latent Entity Typing (BLSTM-LET) method for the RE task was regarded as one of the state-of-the-art language-agnostic approaches. BERT [9] is a contextual text representation model that was shown to achieve state-of-the-art results in 11 different NLP tasks. The BERTEM-MTB [12] model has shown that in both SemEval-2010-Task-8 and TACRED datasets, it is the state of the art of the RE task.

Background
Construction of the PERLEX Bilingual Dataset
Summary
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.