Abstract

The task of building an ontology from a textual corpus starts with the conceptualization phase, which extracts ontology concepts. These concepts are linked by semantic relationships. In this paper, we describe an approach to the construction of an ontology from an Arabic textual corpus, starting first with the collection and preparation of the corpus through normalization, removing stop words and stemming; then, to extract terms of our ontology, a statistical method for extracting simple and complex terms, called "the repeated segments method" are applied. To select segments with sufficient weight we apply the weighting method term frequency---inverse document frequency (TF---IDF), and to link these terms by semantic relationships we apply an automatic method of learning linguistic markers from text. This method requires a dataset of relationship pairs, which are extracted from two external resources: an Arabic dictionary of synonyms and antonyms and the lexical database Arabic WordNet. Finally, we present the results of our experimentation using our textual corpus. The evaluation of our approach shows encouraging results in terms of recall and precision.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.