The article investigates the problems that exist in existing search engines for scientific publications. The search algorithms used in various search engines for scientific publications are described. The aim of the article is to develop a method for selecting publications on a given topic based on assessing the relevance of keyword sets. A review of the literature that was analyzed during the research is presented. Among the publications studied were materials related to the theory of set similarity, namely the use of the Jacquard coefficient and editing distance. A measure for determining the similarity of keyword sets is presented, which is based on the Jacquard coefficient taking into account the weighting coefficients of keywords. An algorithm is presented that can be used to determine the degree of similarity of publications to a user's search query based on keyword sets with weighting coefficients. The algorithm is based on the measure of similarity presented by us and the editing distance presented by us. The algorithm can be used to rank search results in search engines for scientific publications, as well as to compare the efficiency of different search engines, assess the quality of the results they return. The algorithm can also be used in book and film recommendation systems based on user preferences. The article provides the pseudocode of the algorithm. It is demonstrated on a limited data set how the measure calculated by the algorithm changes depending on the distribution of keyword weights in the user's query and the number of keywords.
Read full abstract