Abstract

This chapter discusses the optimized query execution in large search engines with global page ordering. Large web search engines have to answer thousands of queries per second with interactive response times. A major factor in the cost of executing a query is given by the lengths of the inverted lists for the query terms, which increase with the size of the document collection and are often in the range of many megabytes. To address this issue, information retrieval (IR) and database researchers have proposed pruning techniques that compute or approximate term-based ranking functions without scanning over the full inverted lists. This chapter focuses on the question of how such techniques can be efficiently integrated into query processing. It studies pruning techniques for query execution in large engines in the case where one has a global ranking of pages, as provided by Pagerank or any other method, in addition to the standard term-based approach. The chapter describes pruning schemes for this case and evaluates their efficiency on an experimental cluster-based search engine with 120 million web pages. The results show that there is significant potential benefit in such techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call