Exploiting extensible background knowledge for clustering-based automatic keyphrase extraction

Hassan Alrehamy,Coral Walker

doi:10.1007/s00500-018-3414-4

Hassan Alrehamy, Coral Walker

Open Access

https://doi.org/10.1007/s00500-018-3414-4

Copy DOI

Journal: Soft Computing	Publication Date: Aug 16, 2018
Citations: 13	License type: open-access

Affiliation: University of Babylon, Cardiff University

Abstract

Keyphrases are single- or multi-word phrases that are used to describe the essential content of a document. Utilizing an external knowledge source such as WordNet is often used in keyphrase extraction methods to obtain relation information about terms and thus improves the result, but the drawback is that a sole knowledge source is often limited. This problem is identified as the coverage limitation problem. In this paper, we introduce SemCluster, a clustering-based unsupervised keyphrase extraction method that addresses the coverage limitation problem by using an extensible approach that integrates an internal ontology (i.e., WordNet) with other knowledge sources to gain a wider background knowledge. SemCluster is evaluated against three unsupervised methods, TextRank, ExpandRank, and KeyCluster, and under the F1-measure metric. The evaluation results demonstrate that SemCluster has better accuracy and computational efficiency and is more robust when dealing with documents from different domains.

Highlights

Keyphrases are single- or multi-word expressions that describe the essential content of a document
Two frequently used datasets in automatic keyphrase extraction (AKE) literature are chosen as the evaluation datasets: Inspec9 (Hulth 2003) and DUC-2001.10 Both datasets consist of free-text documents with manually assigned keyphrases and differ in length and domain and, are appropriate to test the robustness of SemCluster AKE performance over documents that belong to different domains
We have introduced SemCluster, a clusteringbased unsupervised keyphrase extraction method

Summary

Introduction

Keyphrases are single- or multi-word expressions that describe the essential content of a document. All the type classes associated with external senses of ti in K Bx are mapped into their corresponding synsets in O and are considered as hypernyms of ti. The synset that corresponds to the deepest type class in the schema ontology of K Bx is considered the correct hypernym of the external sense. With this construct, we allow SemCluster to dynamically generate appropriate senses for the terms that are absent in WordNet, or even expand the set of synsets for an existing term. To illustrate with a real-world example, we consider extending O with DBPedia (i.e., K B DB Pedi a) and aligning the type classes in its schema ontology with their equivalent WordNet synsets. The third sense in particular, “Ben Johnson (Sprinter),” is associated with four type classes as depicted in Fig. 4: “owl:Thing,” “dbo:Agent,” “sc:Person,” “dbo:Athlete.” According to the querying algorithm, the deepest among the four classes, “dbo:Athlete,” becomes the hypernym of the third sense and is referred to as “wn:Athlete#n1.”

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploiting extensible background knowledge for clustering-based automatic keyphrase extraction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Soft Computing

Lead the way for us

Similar Papers

Y-Rank: A Multi-Feature-Based Keyphrase Extraction Method for Short Text
Qiang Liu ... Yimu Ji
Applied Sciences | VOL. 14
Qiang Liu, et. al.Qiang Liu ... Yimu Ji
16 Mar 2024
Applied Sciences | VOL. 14

SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation
Hassan H Alrehamy ... Coral Walker
-
Hassan H Alrehamy, et. al.Hassan H Alrehamy ... Coral Walker
05 Sep 2017
05 Sep 2017

WEKE: Learning Word Embeddings for Keyphrase Extraction
Yuxiang Zhang ... Suge Wang
-
Yuxiang Zhang, et. al.Yuxiang Zhang ... Suge Wang
01 Jan 2020
01 Jan 2020

TripleRank: An unsupervised keyphrase extraction algorithm
Tuohang Li ... Ling Chi
Knowledge-Based Systems | VOL. 219
Tuohang Li, et. al.Tuohang Li ... Ling Chi
19 Feb 2021
Knowledge-Based Systems | VOL. 219

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting extensible background knowledge for clustering-based automatic keyphrase extraction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Soft Computing