Keyphrase Generation for Abstracts of the Russian-Language Scientific Articles

D A Morozov,A V Glazkova,M A Tyutyulnikov,B L Iomdin

doi:10.25205/1818-7935-2023-21-1-54-66

Abstract

In this paper, we attempted to adapt various well-known algorithms for keyword selection to a very specific text corpus containing abstracts of Russian academic papers from the mathematical and computer science domain. We faced several challenges including the lack of research in the field of keyword extraction for Russian, the absence of large text corpora of academic abstracts, and the insufficient length of the abstracts. Keywords are often found in the full text of the paper and can simply be highlighted, whereas abstracts may not include keywords in an explicit form. At the same time, it is abstracts that are usually in the public domain, so automatic selection of keywords from them would significantly facilitate the process of searching for papers. Moreover, an automatic keyword selection would be useful even for papers for which keywords were already specified by the authors. During the study, we found that authors often use unique keywords for their papers. This complicates their systematization on a given topic. For visualizing the results, we have created a web resource keyphrases.mca.nsu.ru, where young/beginning scholars can form an approximate list of keywords for their first research paper.

Full Text