Abstract

Automatic keywords extraction is a method that extracts words or phrases from a document which can express the main idea of the document. In this paper, we propose an unsupervised keywords extraction framework for individual documents, which improves the keywords extraction from two aspects. In the step of candidate keywords selection, we use the methods of removing the stopwords, regular matching, and length filtering to reduce the number of candidate keywords, but improve the quality. In the step of scoring words, we use word co-occurrence, semantic relationships (WordNet, Word Embedding, Normalized Google Distance), and three ways to combine word co-occurrence and semantic relationships to measure the weight of edges in the graph model. In experiments, we use Precision, Recall, and F1-measure values as evaluation criteria to compare all keywords extraction methods we proposed with other strong baseline methods in two datasets. According to the results of experiments, methods under our proposed framework achieve good results. We verify that the methods of using both word co-occurrence and semantic relationships have a better effect on keywords extraction than using co-occurrence or semantic relationships only. At the same time, we also find that for the keywords extraction of individual documents, the method of using co-occurrence between words has a better effect than semantic relationships.

Highlights

  • Automatic keywords extraction (AKE) is a kind of method that automatically catches the theme of one document using a small set of words occurred in the document

  • AKE is widely used in many natural language processing (NLP) tasks, such as Text Classification (TC) [1], Document Summarization (DS) [2], [3], Information Retrieval (IR) [4], [25] et al For an IR system, keywords can be applied to index documents and improve the accuracy rate of retrieval results

  • Our methods are verified that the combination of co-occurrence and semantic relationships between words can improve the effectiveness of keywords extraction

Read more

Summary

Introduction

Automatic keywords extraction (AKE) is a kind of method that automatically catches the theme of one document using a small set of words occurred in the document. In the age of ‘‘ information explosion’’, AKE is one method for people to learn information quickly from the document ocean. AKE is widely used in many natural language processing (NLP) tasks, such as Text Classification (TC) [1], Document Summarization (DS) [2], [3], Information Retrieval (IR) [4], [25] et al For an IR system, keywords can be applied to index documents and improve the accuracy rate of retrieval results. Keywords can be seen as a condensed summary.

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.