Abstract

Web search engines are built from components capable of processing large amounts of user queries per second in a distributed way. Among them, the index service computes the top-k documents that best match each incoming query by means of a document ranking operation. To achieve high performance, dynamic pruning techniques such as the WAND and BM-WAND algorithms are used to avoid fully processing all of the documents related to a query during the ranking operation. Additionally, the index service distributes the ranking operations among clusters of processors wherein in each processor multi-threading is applied to speed up query solution. In this scenario, a query running time prediction algorithm has practical applications in the efficient assignment of processors and threads to incoming queries. We propose a prediction algorithm for the WAND and BM-WAND algorithms. We experimentally show that our proposal is able to achieve accurate prediction results while significantly reducing execution time and memory consumption as compared against an alternative prediction algorithm. Our proposal applies the discrete Fourier transform (DFT) to represent key features affecting query running time whereas the resulting vectors are used to train a feed-forward neural network with back-propagation.

Highlights

  • Query running time prediction is useful for effective resource management, query optimization, accurate scheduling and user experience management [1]

  • We focus on query running time prediction in Web search engines (WSE)

  • We have presented a new query running time prediction algorithm based on the discrete Fourier transform (DFT)

Read more

Summary

Introduction

Query running time prediction is useful for effective resource management, query optimization, accurate scheduling and user experience management [1]. Large-scale Web search engines are designed to process hundreds of thousands of queries per second where each query has to be processed within a fraction of a second To achieve this goal, search engines are composed of services capable of processing large amounts of data. Search engines are composed of services capable of processing large amounts of data One of these services is the index service which is responsible for calculating the top-k documents for user queries. A dynamic pruning technique named Weighted AND (WAND) is a strategy that first runs a fast-approximate evaluation on candidate documents, and makes a full costly evaluation limited to the promising candidates only This algorithm enables many documents to be skipped and thereby it is able to achieve efficient performance by reducing the total number of full document score evaluations. Future Internet 2021, 13, 204 by skipping consecutive sets of documents by using a block-wise inverted index where each posting list block has a maximum score

Research Objective
Contribution
Outline
Web Search Engines
The WAND and BM-WAND Dynamic Pruning Techniques
Challenges for Query Running Time Prediction
Related Work
A DFT-Based Query Running Time Prediction Algorithm
Term Coefficients
Query Coefficients
Data Collection and Methodology
Learning Methods
Accuracy Evaluation
Performance Evaluation
An Application Case for Query Running Time Prediction
Accuracy Evaluation of the DFT-Based Algorithm under Multi-Threaded
Performance Evaluation of the DFT-Based Algorithm under Multi-Threaded
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call