Three-Level Caching for Efficient Query Processing in Large Web Search Engines

Xiaohui Long,Torsten Suel

doi:10.1007/s11280-006-0221-0

Abstract

Large web search engines have to answer thousands of queries per second with interactive response times. Due to the sizes of the data sets involved, often in the range of multiple terabytes, a single query may require the processing of hundreds of megabytes or more of index data. To keep up with this immense workload, large search engines employ clusters of hundreds or thousands of machines, and a number of techniques such as caching, index compression, and index and query pruning are used to improve scalability. In particular, two-level caching techniques cache results of repeated identical queries at the frontend, while index data for frequently used query terms are cached in each node at a lower level. We propose and evaluate a three-level caching scheme that adds an intermediate level of caching for additional performance gains. This intermediate level attempts to exploit frequently occurring pairs of terms by caching intersections or projections of the corresponding inverted lists. We propose and study several offline and online algorithms for the resulting weighted caching problem, which turns out to be surprisingly rich in structure. Our experimental evaluation based on a large web crawl and real search engine query log shows significant performance gains for the best schemes, both in isolation and in combination with the other caching levels. We also observe that a careful selection of cache admission and eviction policies is crucial for best overall performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Three-Level Caching for Efficient Query Processing in Large Web Search Engines

Abstract

Talk to us

Similar Papers

More From: World Wide Web

Lead the way for us

Journal: World Wide Web	Publication Date: Dec 1, 2006
Citations: 21

Similar Papers

Three-level caching for efficient query processing in large Web search engines
Xiaohui Long ... Torsten Suel
-
Xiaohui Long, et. al.Xiaohui Long ... Torsten Suel
01 Jan 2004
01 Jan 2004

Optimized Query Execution in Large Search Engines with Global Page Ordering
Xiaohui Long ... Torsten Suel
Proceedings 2003 VLDB Conference | VOL. -
Xiaohui Long, et. al.Xiaohui Long ... Torsten Suel
01 Jan 2003
Proceedings 2003 VLDB Conference | VOL. -

Optimizing the Web Search Engines with Features and Caching
Hui Li ... Shu Zhang
-
Hui Li, et. al.Hui Li ... Shu Zhang
01 Oct 2010
01 Oct 2010

Optimized top-k processing with global page scores on block-max indexes
Dongdong Shan ... Xiaoming Li
-
Dongdong Shan, et. al.Dongdong Shan ... Xiaoming Li
08 Feb 2012
08 Feb 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Three-Level Caching for Efficient Query Processing in Large Web Search Engines

Abstract

Talk to us

Similar Papers

More From: World Wide Web