Algorithms for Finding Duplicate Conferences and Conference Groups in Scientometric Systems

A S Kozitsyn

doi:10.17587/prin.14.195-202

Abstract

The article discusses the methods developed by the author for assessing the proximity of the description of conferences in order to detect duplicates, as well as building groups of conferences. An overview of the existing catalogs of conferences on the Internet is given. Their advantages and disadvantages are analyzed. The necessity of developing methods for more thorough verification of input data about conferences is substantiated. The description of the algorithms developed by the author and their software implementation and testing on big data is given on the example of the scientometric system IAS ISTINA. The developed algorithms make it possible to search for similar conferences by primary descriptions when registering a conference, search for duplicates in the database of the scientometric system, and combine conferences of different years into groups. The described methods can be used in the development of conference catalogs and scientometric systems to improve the quality of initial data verification.

Full Text