A recursive algorithm for open information extraction from Persian texts

Mahmoud Rahat,Seyedamin Monemian,Alireza Talebpour

doi:10.1504/ijcat.2018.092978

Abstract

With the proliferation of textual data accessible in the internet, researchers have focused on shifting Open Information Extraction (Open IE) paradigm to non-English languages. The process of adapting an Open IE system from English to Persian is challenging since two languages have fundamental differences in syntax and dependency representation trees. To the best of our knowledge, this article is the first published paper about Open IE for Persian. Many traditional systems apply a large set of lexical patterns which is inefficient in out-of-domain text. We replace this large pattern set with a few syntactic rules defined upon dependency parse of a sentence that are specifically designed for Persian. We also addressed some Persian-specific phenomena to enhance the results. The recursive nature of the algorithm enabled us to handle nested sentences. Our experiments showed that the proposed system achieves decent performance compared to the state of the art systems in English.

Full Text