Development of fuzzy search method for creating an efficient information search system in text data

Kyrylo Kleshch

doi:10.15587/2706-5448.2024.298425

Abstract

The object of research is the processes of effective search for information in a set of textual data. The subject of the research is the fuzzy search method, which will allow to effectively solve the problem of searching for information in a set of textual data. The paper considers the process of developing a fuzzy search method, which consists of 9 consecutive steps and is required for a quick search for matches in a large set of text data. Based on this method, it is proposed to create a fuzzy search system that will solve the problem of finding the most relevant documents from a set of such documents. The proposed fuzzy search method combines the advantages of algorithms based on deterministic finite automata and algorithms based on dynamic programming for calculating the Damerau-Levenshtein distance. Such a combination allows to implement the symbol similarity table in an optimal way. As part of the work, an approach for creating a symbol similarity table was proposed and an example of such a table was created for symbols from the English alphabet, which allows to find the degree of similarity between two symbols with constant asymptotics and to convert the current symbol into its basic counterpart. For document filtering, a metric was developed to evaluate the correspondence of text data to a search phrase, which simultaneously takes into account the number of found and not found characters and the number of found and not found words. The Damerau-Levenstein algorithm allows to find the edit distance between two words, taking into account the following types of errors: substitution, addition, deletion, and transposition of characters. The work proposed a modification of this algorithm by using a similarity table to more accurately estimate the editing distance between two words. The developed method makes it possible to create a fuzzy search system that will help find the desired results faster and increase the relevance of the obtained results by sorting them according to the values of the proposed test data similarity metric.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Development of fuzzy search method for creating an efficient information search system in text data

Abstract

Talk to us

Similar Papers

More From: Technology audit and production reserves

Lead the way for us

Journal: Technology audit and production reserves	Publication Date: Feb 13, 2024
License type: CC BY 4.0

Similar Papers

Comparison of fuzzy search algorithms based on Damerau-Levenshtein automata on large data
Kyrylo Kleshch ... Volodymyr Shablii
Technology audit and production reserves | VOL. 4
Kyrylo Kleshch, et. al.Kyrylo Kleshch ... Volodymyr Shablii
28 Aug 2023
Technology audit and production reserves | VOL. 4

The words separation in old Cyrillic texts with fuzzy search method
Maksim Mokrousov
-
Maksim MokrousovMaksim Mokrousov
01 Jan 2019
01 Jan 2019

Flexible Maintenance Scheduling of Generation System by Multi-Probabilistic Reliability Criterion in Korea Power System
Jeong-Je Park ... Ung-Ki Baek
Journal of Electrical Engineering and Technology | VOL. 5
Jeong-Je Park, et. al.Jeong-Je Park ... Ung-Ki Baek
01 Mar 2010
Journal of Electrical Engineering and Technology | VOL. 5

Linear Discriminant Analysis for Large-Scale Data: Application on Text and Image Data
Elhadji Ille Gado Nassara ... Edith Grall-Maes
-
Elhadji Ille Gado Nassara, et. al.Elhadji Ille Gado Nassara ... Edith Grall-Maes
01 Dec 2016
01 Dec 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Development of fuzzy search method for creating an efficient information search system in text data

Abstract

Talk to us

Similar Papers

More From: Technology audit and production reserves