The review of the analytical article is a comprehensive study of text analysis modern methods in order to identify and measure the degree of their similarity, which itself is a very important and relevant task, since it examines and analyzes the tools used to solve it. The introduction discusses the purpose of this work, the relevance of the problem, and the importance of developing effective methods for comparing texts. The main part of the article examines and analyzes such methods as “jaccard similarity”, “shingle algorithm”, “levenshtein distance”, “tf-idf” and “bm25”, “bert” and the use of neural networksseparately. The application of a particular method is illustrated by examples presented in tabular form and illustrations. When considering and analyzing the “jaccard similarity”, the methods of its application and limitations are considered. When analyzing the “shingles algorithm”, the advantages of the method in the context of similarity search are revealed. The publication discusses methods based on line spacingin detail, including levenshtein distance. In this case, special attention is paid to the scope of its application and its advantages over other methods. By reviewing statistical methods such as 'tf-idf' and 'bm25', theanalysis of their application and effectiveness in text similaritysearching is given. The article is not limited by analyzing only traditional methods, but it also covers modern ones, including 'bert' and the use of neural networks. These methods are compared with each other, their advantages and disadvantages of use are identified. The conclusion section provides a comparative analysis of all presented methods based on the principle of objectivity, highlighting their characteristics and areas of application. The importance of choosing the most appropriate method for text similaritysearching is noted, depending on the specific search goals, tasks and requirements, and a conclusion is given about the most used, vast and productive method i.e. The use of neural networks. The conclusions emphasize that the article, devoted to a comparative analysis of various methods for similarity searching between texts, has the main goal of developing recommendations to choose the optimal method.
Read full abstract