ONTOLOGY-BASED APPROACH TO MODELING THE PROCESS OF EXTRACTING INFORMATION FROM TEXT

E Sidorova

doi:10.18287/2223-9537-2018-8-1-134-151

Abstract

The article deals with models and methods of knowledge representation, focused on tasks of automatic text processing and information extraction. In the framework of our approach, information extraction is considered as a process of ontology population with information represented as instances of domain concepts. To describe this process three basic models are proposed. The model of the text representation defines the general scheme of text processing and provides the mapping of the received information on the text. The knowledge representation model includes a description of the subject vocabulary, genre models of the text and the models of facts, which allow modeling the processes of information extraction in terms of semantic classes of subject vocabulary and ontology of the subject domain. The attributive model of data representation ensures the preservation of information streams of data that arise in the process of extracting information, and allows the use of ontological methods for solving ambiguity problems and resolving the coreference. Thus, an original technique that allows users to design a text analysis system and simulate the information extraction based on the domain ontology is proposed.

Full Text