Abstract

Users of research databases, such as CiteSeerX, Google Scholar, and Microsoft Academic, often search for papers using a set of keywords. Unfortunately, many authors avoid listing sufficient keywords for their papers. As such, these applications may need to automatically associate good descriptive keywords with papers. When the full text of the paper is available this problem has been thoroughly studied. In many cases, however, due to copyright limitations, research databases do not have access to the full text. On the other hand, such databases typically maintain metadata, such as the title and abstract and the citation network of each paper. In this paper we study the problem of predicting which keywords are appropriate for a research paper, using different methods based on the citation network and available metadata. Our main goal is in providing search engines with the ability to extract keywords from the available metadata. However, our system can also be used for other applications, such as for recommending keywords for the authors of new papers. We create a data set of research papers, and their citation network, keywords, and other metadata, containing over 470K papers with and more than 2 million keywords. We compare our methods with predicting keywords using the title and abstract, in offline experiments and in a user study, concluding that the citation network provides much better predictions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.