Present and absent keyphrases extraction: an approach based on sentence embedding

Lahbib Ajallouda,Ahmed Zellou

doi:10.11591/ijeecs.v28.i3.pp1601-1612

Abstract

The automatic keyphrases extraction (AKE) of a document is any expression by which we can learn its content without having to read it. Keyphrases are exploited in natural language processing (NLP) applications. These phrases are often mentioned in the document but there may be some keyphrases that are not mentioned. In the field of AKE, researchers have exploited many techniques, such as statistical calculation, deep learning algorithms, graph representation, and sentence embedding techniques. Approaches that exploit embedding techniques calculate the similarity between a document and a candidate keyphrase, where similar phrases to the document are considered as keyphrases. Representing the document by a single vector makes its performance poor, especially in long documents. This is in addition to the inability of these methods to generate absent keyphrases. In order to overcome these problems, our paper proposes an unsupervised approach to AKE, based on the universal sentence encoder (USE) to represent candidate keyphrases and parts of the document probably containing keyphrases. Our method also generates keyphrases not mentioned in the text. We compared the performance of the proposed approach with other methods based on embedding techniques, where the results showed the superiority of our approach especially in long documents.

Full Text