Abstract

Background. Development of methods and algorithms for efficient search of relevant information on demand. The article deals with the consolidation of data for subsequent use in the information and analytical systems. Objective. The aim of the paper is to identify capabilities and build relevant information search algorithms from disparate sources by analyzing the probability information identifying the possible presence of relevant documents in these sources. Methods. To find the relevant information for search queries the approach based on the use of probability estimates of relevant documents available in the sources of further increasing the number of selected documents from these sources to analyze their relevance to the query is used. Results. A stochastic programmable automaton structure to ensure selection of the most possible information sources by relevance parameters and information retrieval algorithm based on the use of stochastic automaton were developed. Conclusions. The described algorithm using stochastic automaton for data consolidation allows developing a set of software tools, provides plenty full and holistic data consolidation problem-solving for diverse systems which search for information from information sources different in composition and presentation type.

Highlights

  • Directed information gathering based on open sources is considered as one of the standard methods of information gathering in different spheres of modern society

  • Today there are many information search models, that can form the basis for information consolidation system, that are based on various mathematical methods

  • The described stochastic algorithm when processing a large number of documents showed twice better performance on the number of selected relevant documents than direct search algorithms that are based on Boolean and vector models

Read more

Summary

Background

Development of methods and algorithms for efficient search of relevant information on demand. To find the relevant information for search queries the approach based on the use of probability estimates of relevant documents available in the sources of further increasing the number of selected documents from these sources to analyze their relevance to the query is used. A stochastic programmable automaton structure to ensure selection of the most possible information sources by relevance parameters and information retrieval algorithm based on the use of stochastic automaton were developed. The described algorithm using stochastic automaton for data consolidation allows developing a set of software tools, provides plenty full and holistic data consolidation problem-solving for diverse systems which search for information from information sources different in composition and presentation type

Introduction
Problem statement
Review of existing solutions
The mathematical formulation of the problem
The consolidation algorithm based on the use of stochastic automaton
Number of processed documents
Conclusions
Список літератури
ВИКОРИСТАННЯ СТОХАСТИЧНОГО АВТОМАТУ ДЛЯ КОНСОЛІДАЦІЇ ДАНИХ
Findings
ИСПОЛЬЗОВАНИЕ СТОХАСТИЧЕСКОГО АВТОМАТА ДЛЯ КОНСОЛИДАЦИИ ДАННЫХ
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call