Abstract

This paper presents a novel approach to extract information for building ontologies for an extensive range of applications from corpora. Our goal is to propose a method that is independent of domains and based on a distributional analysis of semantic units to bring out all the candidate’s informative elements (concepts, entities, semantic relations, named entities etc.). This method is based on a pipeline of four main stages allows for the extraction of information from unstructured text in the form of a suite of decomposable representations (sentences in triplets, ‘argumental structure’ etc.) until a consistent final ontology is obtained. We applied the defined pipeline a repeated sampling of 100 articles randomly drawn from a text corpus (‘Le Monde’ of annual version ‘2013’). The evaluation results of the trial implementation of our system level of accuracy to be up to 74%. The results obtained indicate that the proposed methodology is quite generic and can be easily adapted to any new domain.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call