Abstract

Traditionally, keyphrases (or keywords) have been manually assigned to documents by their authors or by human indexers. This, however, has become impractical due to the massive growth of documents|particularly short articles (e.g. microblogs, abstracts, snippets)|on the Internet each day, thus creating a need for systems that automatically extract keyphrases from documents. Automatic keyphrase extraction methods have generally taken either supervised or unsupervised approaches. Supervised methods extract keyphrases by using a training document set, thus acquiring knowledge from a global collection of texts. Conversely, unsupervised methods rank phrases by their importance in a single-document context, without prior learning. We present a hybrid keyphrase extraction method for short articles, HybridRank, which leverages the benets of both approaches. Our system implements modied versions of the TextRank [6] (unsupervised) and KEA [16] (supervised) methods, and applies a merging algorithm to produce an overall list of keyphrases. We have tested HybridRank on more than 900 abstracts belonging to a wide variety of subjects, including engineering, science, physics and IT, and show its superior eectiveness. It is observed that knowledge collaboration between supervised and unsupervised methods can produce higher-quality keyphrases than applying these methods individually.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call