Relation Extraction from Arabic Wikipedia

Gehad Zakria,Khaled Fathy,Malak N Makar,Mamdouh Farouk

doi:10.17485/ijst/2019/v12i46/147512

Gehad Zakria, Khaled Fathy + Show 2 more

Open Access

https://doi.org/10.17485/ijst/2019/v12i46/147512

Copy DOI

Journal: Indian Journal of Science and Technology	Publication Date: Dec 20, 2019
Citations: 10	License type: cc-by

Affiliation: Assiut University

Abstract

Objectives/Methods: This study aims to extract relations between entities from Arabic text. RelationExtraction is one of the most important tasks in text mining. Relation extraction is considered as a main step for many applications such as extracting triples from the text, Question Answering and Ontology building. However, extracting relations from the Arabic text is a difficult task compared to English due to lack of annotated Arabic corpora. This paper proposes a method for extracting relations from Arabic text based on ArabicWikipedia articles characteristics.The propose system extracts sentences that contain principle entity, secondary entity and relation from Wikipedia article, then we use WordNet and DBpedia to build the training set. Finally Naive Bayes Classifier is used to train and test the datasets. Finding: There are few works to extract relations from Arabic text. These works depend on classification, clustering and rule based. Application/ improvement: The experiments show the effectiveness of the proposed approach which achieves high precision with 89% for classifying 19 type of semantic relations.Keywords: Relation Extraction, Arabic Wikipedia, Semantic Relation, Arabic language.

Highlights

There is a significant increase in information on the Internet
This study aims to deal with the Arabic Wikipedia to extract relations between the principle entity and the secondary entity
The following results were obtained from testing proposed system: We evaluated the model by collecting a number of sentences from different pages of Wikipedia randomly

Summary

Introduction

There is a significant increase in information on the Internet. The recent works in semantic web aim to access this information semantically to answer queries that exceed the capabilities of the standard search engines.Wikipedia (www.wikipedia.org) is one of the most important sources of information that is a Wikimedia Foundation project. Wikipedia is the largest online encyclopedia in the world. It posts its articles every day by its founders. The Information Extraction (IE) aims to extract structured information from unstructured text by extracting the entities from the text and identifying relations between them automatically. Relation Extraction (RE) is the main task in Information Extraction (IE) from the text

Objectives

Results

Conclusion