Efficient query processing techniques for next-page retrieval

Joel Mackenzie,Matthias Petri,Alistair Moffat

doi:10.1007/s10791-021-09402-7

Joel Mackenzie, Matthias Petri + Show 1 more

Open Access

https://doi.org/10.1007/s10791-021-09402-7

Copy DOI

Abstract

In top-k ranked retrieval the goal is to efficiently compute an ordered list of the highest scoring k documents according to some stipulated similarity function such as the well-known BM25 approach. In most implementation techniques a min-heap of size k is used to track the top scoring candidates. In this work we consider the question of how best to retrieve the second page of search results, given that a first page has already been computed; that is, identification of the documents at ranks k+1 to 2k for some query. Our goal is to understand what information is available as a by-product of the first-page scoring, and how it can be employed to accelerate the second-page computation, assuming that the second-page of results is required for only a fraction of the query load. We propose a range of simple, yet efficient, next-page retrieval techniques which are suitable for accelerating Document-at-a-Time mechanisms, and demonstrate their performance on three large text collections.

Highlights

Top-k similarity search is a well-known problem in information retrieval
Those k documents are presented in a search result page (SERP), for the
Work explored user interaction patterns through large-scale query logs derived from web search systems, demonstrating that users typically browse in a top-down fashion, with the majority of clicks occurring on the first page, and the majority of users only viewing a single SERP (Silverstein et al, 1999; Jansen et al, 2000; Spink et al, 2001; Jansen & Spink 2006)

Summary

Introduction

Top-k similarity search is a well-known problem in information retrieval. Given a collection of documents, D, and a query of q terms Q = {t1, t2, ... , tq} , the goal is to find the k highest scoring documents in D according to a similarity function S(Q, d) (Zobel & Moffat, 2006). Work explored user interaction patterns through large-scale query logs derived from web search systems, demonstrating that users typically browse in a top-down fashion, with the majority of clicks occurring on the first page, and the majority of users only viewing a single SERP (Silverstein et al, 1999; Jansen et al, 2000; Spink et al, 2001; Jansen & Spink 2006). These observations have been confirmed in more recent studies (Costa & Silva, 2010). Most of the user attention is focused on the early positions of the first page, with a large drop-off across page boundaries, and with only a very small percentage of users proceeding into the second results page

Objectives

Methods

Conclusion