Abstract
This paper presents a novel approach to extract information for building ontologies for an extensive range of applications from corpora. Our goal is to propose a method that is independent of domains and based on a distributional analysis of semantic units to bring out all the candidate’s informative elements (concepts, entities, semantic relations, named entities etc.). This method is based on a pipeline of four main stages allows for the extraction of information from unstructured text in the form of a suite of decomposable representations (sentences in triplets, ‘argumental structure’ etc.) until a consistent final ontology is obtained. We applied the defined pipeline a repeated sampling of 100 articles randomly drawn from a text corpus (‘Le Monde’ of annual version ‘2013’). The evaluation results of the trial implementation of our system level of accuracy to be up to 74%. The results obtained indicate that the proposed methodology is quite generic and can be easily adapted to any new domain.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have