Extraction of terms and semantic relationships from Arabic texts for automatic construction of an ontology

Mohammed El-Amine Abderrahim,Mohammed Alaeddine Abderrahim,Ali Benabdallah

doi:10.1007/s10772-017-9405-5

Mohammed El-Amine Abderrahim, Mohammed Alaeddine Abderrahim + Show 1 more

https://doi.org/10.1007/s10772-017-9405-5

Copy DOI

Abstract

The task of building an ontology from a textual corpus starts with the conceptualization phase, which extracts ontology concepts. These concepts are linked by semantic relationships. In this paper, we describe an approach to the construction of an ontology from an Arabic textual corpus, starting first with the collection and preparation of the corpus through normalization, removing stop words and stemming; then, to extract terms of our ontology, a statistical method for extracting simple and complex terms, called "the repeated segments method" are applied. To select segments with sufficient weight we apply the weighting method term frequency---inverse document frequency (TF---IDF), and to link these terms by semantic relationships we apply an automatic method of learning linguistic markers from text. This method requires a dataset of relationship pairs, which are extracted from two external resources: an Arabic dictionary of synonyms and antonyms and the lexical database Arabic WordNet. Finally, we present the results of our experimentation using our textual corpus. The evaluation of our approach shows encouraging results in terms of recall and precision.

Full Text