Efficient Index-Based Snippet Generation

Hannah Bast,Marjan Celikik

doi:10.1145/2590972

Abstract

Ranked result lists with query-dependent snippets have become state of the art in text search. They are typically implemented by searching, at query time, for occurrences of the query words in the top-ranked documents. This document-based approach has three inherent problems: (i) when a document is indexed by terms which it does not contain literally (e.g., related words or spelling variants), localization of the corresponding snippets becomes problematic; (ii) each query operator (e.g., phrase or proximity search) has to be implemented twice, on the index side in order to compute the correct result set, and on the snippet-generation side to generate the appropriate snippets; and (iii) in a worst case, the whole document needs to be scanned for occurrences of the query words, which could be problematic for very long documents. We present a new index-based method that localizes snippets by information solely computed from the index and that overcomes all three problems. Unlike previous index-based methods, we show how to achieve this at essentially no extra cost in query processing time, by a technique we call operator inversion . We also show how our index-based method allows the caching of individual segments instead of complete documents, which enables a significantly larger cache hit-ratio as compared to the document-based approach. We have fully integrated our implementation with the CompleteSearch engine.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient Index-Based Snippet Generation

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Information Systems

Lead the way for us

Journal: ACM Transactions on Information Systems	Publication Date: Apr 1, 2014
Citations: 8

Similar Papers

Fast error-tolerant search on very large texts
Marjan Celikik ... Holger Bast
-
Marjan Celikik, et. al.Marjan Celikik ... Holger Bast
08 Mar 2009
08 Mar 2009

Numerical Descriptive Inherent Safety Technique (NuDIST) for inherent safety assessment in petrochemical industry
Syaza I Ahmad ... Mimi H Hassim
Process Safety and Environmental Protection | VOL. 92
Syaza I Ahmad, et. al.Syaza I Ahmad ... Mimi H Hassim
12 Apr 2014
Process Safety and Environmental Protection | VOL. 92

An efficient multiversion access structure
P.J Varman ... R.M Verma
IEEE Transactions on Knowledge and Data Engineering | VOL. 9
P.J Varman, et. al.P.J Varman ... R.M Verma
01 Jan 1997
IEEE Transactions on Knowledge and Data Engineering | VOL. 9

Effective Lightweight Learning-to-Rank Method Using Unified Term Impacts
Sheila Da N Silva ... Altigran S Da Silva
IEEE Access | VOL. 8
Sheila Da N Silva, et. al.Sheila Da N Silva ... Altigran S Da Silva
01 Jan 2020
IEEE Access | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Index-Based Snippet Generation

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Information Systems