Abstract

A search query, being a very concise grounding of user intent, could potentially have many possible interpretations. Search engines hedge their bets by diversifying top results to cover multiple such possibilities so that the user is likely to be satisfied, whatever be her intended interpretation. Diversified Query Expansion is the problem of diversifying query expansion suggestions, so that the user can specialize the query to better suit her intent, even before perusing search results. In this paper, we consider the usage of semantic resources and tools to arrive at improved methods for diversified query expansion. In particular, we develop two methods, those that leverage Wikipedia and pre-learnt distributional word embeddings respectively. Both the approaches operate on a common three-phase framework; that of first taking a set of informative terms from the search results of the initial query, then building a graph, following by using a diversity-conscious node ranking to prioritize candidate terms for diversified query expansion. Our methods differ in the second phase, with the first method Select-Link-Rank (SLR) linking terms with Wikipedia entities to accomplish graph construction; on the other hand, our second method, Select-Embed-Rank (SER), constructs the graph using similarities between distributional word embeddings. Through an empirical analysis and user study, we show that SLR ourperforms state-of-the-art diversified query expansion methods, thus establishing that Wikipedia is an effective resource to aid diversified query expansion. Our empirical analysis also illustrates that SER outperforms the baselines convincingly, asserting that it is the best available method for those cases where SLR is not applicable; these include narrow-focus search systems where a relevant knowledge base is unavailable. Our SLR method is also seen to outperform a state-of-the-art method in the task of diversified entity ranking.

Highlights

  • Users of a search system may choose the same initial search query for varying information needs

  • Another work [19] proposes scoring candidate query expansion terms using the similarity of their word embeddings to those of the terms in the query. Whole both these methods do not incorporate mechanisms for diversifications within them, we extend the latter model, called RM-CombSum with an Maximum Marginal Relevance (MMR) [5] based diversification, leading to a word-embedding based diversified query expansion method that we will use as a baseline method in our empirical evaluation

  • We considered the task of leveraging external semantic resources for the Diversified Query Expansion task

Read more

Summary

Introduction

Users of a search system may choose the same initial search query for varying information needs. Such difficulties in covering long tail aspects, as noted in [2], led to research interest in a slightly different task attacking the same larger goal, that of Diversified Query Expansion (DQE). For an unambiguous query: python programming, there are many aspects based on whether the user is interested in books, software or courses. For another seemingly unambiguous query, india, the aspects of interest could include railways, maps, news and cricket

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.