Abstract

Published scholarly articles have increased exponentially in recent years. This growth has brought challenges for academic researchers in locating the most relevant papers in their fields of interest. The reasons for this vary. There is the fundamental problem of synonymy and polysemy, the query terms might be too short, thus making it difficult to distinguish between papers. Also, a new researcher has limited knowledge and often is not sure about what she is looking for until the results are displayed. These issues obstruct scholarly retrieval systems in locating highly relevant publications for a given search query. Researchers seek to tackle these issues. However, the user's intent cannot be addressed entirely by introducing a direct information retrieval technique. In this paper, a novel approach is proposed, which combines query expansion and citation analysis for supporting the scholarly search. It is a two-stage academic search process. Upon receiving the initial search query, in the first stage, the retrieval system provides a ranked list of results. In the second stage, the highest-scoring Term Frequency–Inverse Document Frequency (TF-IDF) terms are obtained from a few top-ranked papers for query expansion behind the scene. In both stages, citation analysis is used in further refining the quality of the academic search. The originality of the approach lies in the combined exploitation of both query expansion by pseudo relevance feedback and citation networks analysis that may bring the most relevant papers to the top of the search results list. The approach is evaluated on the ACL dataset. The experimental results reveal that the technique is effective and robust for locating relevant papers regarding normalized Discounted Cumulative Gain (nDCG), precision, and recall.

Highlights

  • The rate in publications is about 2.5 million per year [1]

  • We look at Pseudo Relevance Feedback (PRF)-based query expansion (QE) methods in greater detail

  • The microscopic analysis reveals our hypothesis that the incorporation of QE through PRF and citation network analysis can support academic searchers in today’s colossal expansion of academic literature

Read more

Summary

Introduction

The rate in publications is about 2.5 million per year [1] This large increase in the number of scholarly publications makes finding relevant papers with a few keyword query a challenging task [2]. This can be caused by several reasons: First, the problem of synonym and polysemy [3, 4], i.e. the query terms submitted, can be related to multiple topics due to which the search results list may not contain the intended papers. Even if searchers know what they are looking for, they are unable to formulate the search query for increasing accuracy and completeness of the search results These issues obstruct scholarly retrieval systems in locating highly relevant publications for a given search query. To the best of our knowledge, no technique has been investigated that uses both query expansion (QE) by PRF and citation analysis in scholarly search

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call