Abstract

An essential part of a text generation task is to extract critical information from the text. People usually obtain critical information in the text via manual extraction; however, the asymmetry between the ability to process information manually and the speed of information growth makes it impossible. This problem can be solved by automatic keyphrase extraction. In this paper, the mainstream unsupervised methods to extract keyphrases are summarized, and we analyze in detail the reasons for the differences in the performance of methods then provided some solutions.

Highlights

  • Under the background of the continuous development of the information age, the content based on words grows exponentially, making it more challenging to manage this large-scale information.This information could be processed manually in the past

  • Keyphrase extraction is widely used in many fields, such as natural language processing (NLP), information retrieval (IR) [9,10,11,12], opinion mining [13,14,15], document indexing [16], and document classification [17]

  • SingleRank: In view of the fact that the graphs constructed by TextRank are unweighted graphs and the weights of the edges can reflect the strength of the semantic relationship between the two nodes, using the weighted graph may be better in the keyphrase extraction task

Read more

Summary

Introduction

Under the background of the continuous development of the information age, the content based on words grows exponentially, making it more challenging to manage this large-scale information This information could be processed manually in the past. The supervised method [18] transforms the keyphrase extraction task into a classification problem [19,20] or regression problem [21] It trains the model on the labeled training set and uses the trained model to determine whether a candidate word in a text is a keyphrase. We divide keyphrase extraction into the linguistic school and the statistical school We continue this classification method to divide commonly used metrics, features that affect keyphrase extraction, and mainstream unsupervised keyphrase extraction methods, making the structure and development path of the entire field look clear.

What Datasets Are There in the Keyphrase Extraction Field?
What Are the Evaluation Metrics in the Keyphrase Extraction Field?
Statistics-Based Metrics
Linguistics-Based Metrics
Linguistic-Based Features
Statistical-Based Features
Linguistic-Based
Statistical-Based
Unsupervised Keyphrase Extraction Methods
Classification of Unsupervised Keyphrase Extraction Methods
Statistics-Based Methods
Graph-Based Methods
Transfer Learning-Based Methods
Clustering-Based Methods
Language Model-Based Methods
The State of the Art
Method
Analysis
The Performance of the Methods
The Impacts of Dataset on Performance
Limitation of Keyphrase Extraction Methods
The Impact of Gold Standard on Evaluation
The Impact of Manually Assigned Labels on Evaluation
Our Recommendations
Conclusions and Future Directions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call