Abstract
In operation the way of solving the problem of quick search of information in unstructured information resources is offered. Four main units realizing information search in semantic values are constructed and described. In article the algorithm of the decision of the task of assessment of compliance of semantic contents of text documents of the given data domain is offered. The offered infologichesky approach is executed on the basis of data analysis of patent search, the published scientific operations and the conducted pilot studies of effective methods of automatic assessment of maintenance of unstructured information resources for the organization of processes of information and analytical support of scientific activities. In operation the method of assessment and comparison of a subject directivity of data in unstructured information resources, on a basis use of infologichesky system is offered. This method assumes carrying out a clustering of text documents by comparing of semantic contents of the researched text and the anthology. The structure of the retrieval subsystem having the service-oriented client-server architecture with the thin client (web observer) is described. The described method was approved on a set of the texts received as a result of monitoring of open public infocommunication Internet resources without restriction of a subject (more than 1 million copies of texts are received and processed). Among the received texts by an expert way learning selection for the following types of texts was created: artistic texts, scientific technical articles, the pseudoscientific texts received as a result of operation of systems, a spam automatically generated - the containing texts. The composition is offered and the general architecture of the software of infologichesky system is described, principal components of system are cross-platform. On the basis of results of the pilot studies the basic possibility of implementation of automated assessment of subject similarity of documents on the example of infologichesky processing of texts of working programs of disciplines is shown, requirements imposed to the program interface of interaction of a prototype with external search engines are created.Key words: infological system, assessment of the thematic similarity, information resource working program of discipline, competence, semantic analysis, meaning.
Highlights
Ключевые слова: инфологическая система, оценка тематического подобия, информационный ресурс, рабочая программа дисциплины, компетенция, семантический анализ, смысловое значение
Загрузив архив из 16 документов в программный макет инфологической системы, пользователь получает возможность отображения семантических сущностей текста в виде понятийного окружения каждого документа посредством визуального графа понятийной иерархии
В работе показано информационное взаимодействие блоков технологической подготовки входных информационных ресурсов, структуризации и нормализации описания, определения тематики и анализа смысловых значений и вывода результатов в едином технологическом цикле инфологической обработки текстовых документов
Summary
Ключевые слова: инфологическая система, оценка тематического подобия, информационный ресурс, рабочая программа дисциплины, компетенция, семантический анализ, смысловое значение. В работе предложен способ оценки и сопоставления тематической направленности данных в неструктурированных информационных ресурсах, на основе применении инфологической системы. Данный способ предполагает проведение кластеризации текстовых документов путем сравнения семантического содержания исследуемого текста и антологии.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have