AdaptiveUKE: Towards adaptive unsupervised keyphrase extraction with gated topic modeling

Qi Liu,Wenjun Ke,Xiaoguang Yuan,Yuting Yang,Hua Zhao,Peng Wang

doi:10.1016/j.eswa.2024.123926

Abstract

Keyphrase extraction aims to identify the purport-specific words or phrases from a free-format natural language document, which also requires the keyphrases to be concise and informative. Existing works mainly encode the document and phrases into the same vector space and score keyphrases directly employing similarity measures. However, such methods have two major limitations: (1) The semantic richness of documents in real-world scenarios can be diverse and it is tricky to model such diversity when representing a document with one specific embedding. (2) Using similarity measures alone to score keyphrases fails to detect phrases from all related topics, leading to semantics over-concentration of extracted results. To address these challenges, we propose AdaptiveUKE,22Our code is released at https://github.com/NLPCodebase/AdaptiveUKE. a simple yet effective model for adaptive unsupervised keyphrase extraction. Firstly, to adapt to the varying degree of document semantic richness, we propose a novel gated topic modeling strategy, allowing each topic to be assigned independently rather than in a competitive fashion. Secondly, we design a topic-guided scoring algorithm to extract keyphrases under unsupervised conditions. Fusing the importance and relatedness of each topic enables efficient ranking of candidate phrases. Finally, we conduct extensive experiments on six widely-used datasets. The results show that AdaptiveUKE achieves highly competitive results across several keyphrase extraction benchmarks, surpassing state-of-the-art on Inspec, DUC2001, SemEval2010 and reaching the current best performance on SemEval2017, Theses100, Krapivin2009.

Full Text