Abstract

Search result diversification based on topic proportionality considers a document as a bag of weighted topics and aims to reorder or down-sample a ranked list in a way that maintains topic proportionality. The goal is to show the topic distribution from an ambiguous query at all points in the revised list, hoping to satisfy all users in expectation. One effective approach, PM-2, greedily selects the best topic that maintains proportionality at each ranking position and then selects the document that best represents that topic. From a theoretical perspective, this approach does not provide any guarantee that topic proportionality holds in the small ranked list. Moreover, this approach does not take query-document relevance into account. We propose a Linear Programming (LP) formulation, LP-QL, that maintains topic proportionality and simultaneously maximizes relevance. We show that this approach satisfies topic proportionality constraints in expectation. Empirically, it achieves a 5.5% performance gain (significant) in terms of alpha-NDCG compared to PM-2 when we use LDA as the topic modelling approach. Furthermore, we propose LP-PM-2 that integrates the solution of LP-QL with PM-2. LP-PM-2 achieves 3.2% performance gain (significant) over PM-2 in terms of alpha-NDCG with term based topic modeling approach. All of our experiments are based on a popular web document collection, ClueWeb09 Category B, and the queries are taken from TREC Web Track's diversity task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call