Abstract
Keyphrases are single- or multi-word phrases that are used to describe the essential content of a document. Utilizing an external knowledge source such as WordNet is often used in keyphrase extraction methods to obtain relation information about terms and thus improves the result, but the drawback is that a sole knowledge source is often limited. This problem is identified as the coverage limitation problem. In this paper, we introduce SemCluster, a clustering-based unsupervised keyphrase extraction method that addresses the coverage limitation problem by using an extensible approach that integrates an internal ontology (i.e., WordNet) with other knowledge sources to gain a wider background knowledge. SemCluster is evaluated against three unsupervised methods, TextRank, ExpandRank, and KeyCluster, and under the F1-measure metric. The evaluation results demonstrate that SemCluster has better accuracy and computational efficiency and is more robust when dealing with documents from different domains.
Highlights
Keyphrases are single- or multi-word expressions that describe the essential content of a document
Two frequently used datasets in automatic keyphrase extraction (AKE) literature are chosen as the evaluation datasets: Inspec9 (Hulth 2003) and DUC-2001.10 Both datasets consist of free-text documents with manually assigned keyphrases and differ in length and domain and, are appropriate to test the robustness of SemCluster AKE performance over documents that belong to different domains
We have introduced SemCluster, a clusteringbased unsupervised keyphrase extraction method
Summary
Keyphrases are single- or multi-word expressions that describe the essential content of a document. All the type classes associated with external senses of ti in K Bx are mapped into their corresponding synsets in O and are considered as hypernyms of ti. The synset that corresponds to the deepest type class in the schema ontology of K Bx is considered the correct hypernym of the external sense. With this construct, we allow SemCluster to dynamically generate appropriate senses for the terms that are absent in WordNet, or even expand the set of synsets for an existing term. To illustrate with a real-world example, we consider extending O with DBPedia (i.e., K B DB Pedi a) and aligning the type classes in its schema ontology with their equivalent WordNet synsets. The third sense in particular, “Ben Johnson (Sprinter),” is associated with four type classes as depicted in Fig. 4: “owl:Thing,” “dbo:Agent,” “sc:Person,” “dbo:Athlete.” According to the querying algorithm, the deepest among the four classes, “dbo:Athlete,” becomes the hypernym of the third sense and is referred to as “wn:Athlete#n1.”
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.