Abstract

Achieving interactive response times when searching for documents on the web has become a challenge especially with the tremendous increase in the size of information available nowadays. Incorporating parallelism in search engines is one of the approaches towards achieving this aim. In this paper, we present a model for parallel query processing. Then, this model is extended particularly for usage on shared-memory and cluster parallel architectures. A special simulator, reflecting the proposed model, was developed allowing parameters concerning the data set, queries and architectures to be varied. A total of 32 experiments were conducted and the output was studied for the effect of varying different parameters. A number of performance measures such as average response time, speedup and efficiency are computed to study the effect of varying the parameters. Results show that in terms of average response time, speedup and efficiency, the proposed model for parallel query processing on shared-memory architecture outperforms that on cluster-based architecture.

Highlights

  • With the dramatic increase in the size of the web, search engines had to scale up to keep up with this growth

  • We present a model for parallel query processing

  • A number of performance measures such as average response time, speedup and efficiency are computed to study the effect of varying the parameters

Read more

Summary

Introduction

With the dramatic increase in the size of the web, search engines had to scale up to keep up with this growth. In 1997, the top search engines indexed from 2 million to 200 million web documents. Estimates of the number of indexed pages has reached at least 15.78 billion pages as reported by Yahoo, Google and Bing in Daily estimated size of the World Wide Web (2013). There has been a vast increase in the number of internet users due to the fact that the world wide web has become the primary source of information. In 2011, the average number of queries per day reached around 4 billion on Google as reported in Google Annual Search Statistics (2013). We attempt to index the documents in such a way that parallelism is exploited in query processing. We propose an approach that parallelizes query processing using shared-memory and cluster-based architectures

Related Work
Inverted Indices
Document-Based and Term-Based Partitioning
Inverted Index
Inverted Index Partitioning
Using the Shared-Memory Model
Using Clusters
Simulation
Queries and Inverted Index Generator
Generating Queries
Generating Inverted Index
Query Processing on Shared-Memory Simulator
Query Processing on Clusters Simulator
Parameters and Experiments
Results and Discussion
C Average Speedup Efficiency
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call