MICRank: Multi-information interconstrained keyphrase extraction

Ran Bai,Fang'Ai Liu,Xuqiang Zhuang,Yaoyao Yan

doi:10.1016/j.eswa.2024.123744

Abstract

Keyphrase Extraction is an automatic task that involves identifying the key words or phrases that capture the main content of an article. It is useful for various downstream tasks, including text search, text clustering, and text classification. Embedding-based methods for keyphrase extraction have shown promising results by utilizing pre-trained language models to represent candidate phrases and documents separately. These methods then rank the candidate phrases based on the cosine similarity between the document and the candidate phrases embeddings. However, there are mainly two shortcomings in such methods: I) Redundancy errors, when there are partial repetitions of candidate keyphrases, the methods tend to use redundant long phrases as keyphrases; II) Low keyphrase coverage, such as some keyphrases used to describe locally important information are ignored. In this paper, we propose an unsupervised keyphrase extraction method called “MICRank”, which evaluates the importance of candidate keyphrases from three perspectives: global information, local information, and attribute information, and solved the aforementioned issues. The experimental results on six benchmarks demonstrate that the proposed MICRank method outperforms the state-of-the-art unsupervised keyphrase extraction methods. In addition, this paper improves the judgment criterion of correct keyphrase extraction and introduces a new evaluation metric called S1@M (M ∈ {5,10,15}) to address the issue of synonyms being considered incorrect predictions.

Full Text