The keyphrase extraction task is a fundamental and challenging task designed to extract a set of keyphrases from textual documents. Keyphrases are essential to assist publishers in indexing documents and readers in identifying the most relevant ones. They are short phrases composed of one or more terms used to represent a textual document and its main topics. In this article, we extend our research on C-Rank, which is an unsupervised approach that automatically extracts keyphrases from single documents. C-Rank uses concept-linking to link concepts in common between single documents and an external background knowledge base. We advance our study over C-Rank by evaluating it using different concept-linking approaches - Babelfy and DBPedia Spotlight. We evaluated C-Rank on data sets composed of academic articles, academic abstracts, and news articles. Our findings indicate that C-Rank achieves state-of-the-art results extracting keyphrases from scientific documents by experimentally comparing it to existing unsupervised approaches.
Read full abstract