Abstract

Keyphrases provide core information for users to understand the document. Most previous works utilize machine learning based methods for keyphrases extraction and achieve promising performance. However, these methods focus on identify keyphrases from the input text, and can not extract keyphrases that do not appear in the text. In this paper, we present an encoder-decoder framework, which incorporating copying mechanism, to generate keyphrases for the given text. This framework (CopyNet) integrates the generation part and copying part. The generation part generates the keyphrase from the predefined vocabulary, and the copy part gets the keyphrases from the source text. Furthermore, we improve the CopyNet by using different probability of the two parts. To incorporate more related information for keyphrase generation, the automatically built keyphrase semantic web is merged into the dataset to participate in the training process of the neural network. Semantic similarity based and word co-occurrence based methods are used for keyphrase semantic web construction. We build a large-scale biomedical keyphrase dataset to evaluate the system performance. Experiments show that our improved CopyNet can achieve better performance with different portions of the generation and copying part, and the incorporation of the semantic web also effectively improves the keyphrase generation.

Highlights

  • Keyphrases are the basic units for expressing the semantic information of the document

  • We present the CopyNet to generates keyphrases based on the semantic information and the important text information of the source document

  • In this paper, we presented the encoder-decoder framework with copying mechanism to generate keyphrases for the given text

Read more

Summary

INTRODUCTION

Keyphrases are the basic units for expressing the semantic information of the document. Traditional encoder-decoder framework is confronted with the out-of-vocabulary (OOV) problem, because it only generates keyphrases from the vocabulary Since some of these OOV keyphrases may occur in the source text, we introduce copying mechanism into this framework to select some important segments as the keyphrases. The encoder-decoder framework with attention and copying mechanism (CopyNet) generates keyphrases based on the semantic information and the important text information of the source document. To leverage some important information from the source text, some works explore methods to copy appropriate parts of the input sequence into the output sequence [35]–[37]

METHODS
TASK FORMALIZATION
ATTENTION MECHANISM
SEMANTIC WEB CONSTRUCTION
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call