Exploiting Parallelism in Query Processing for Web Document Search Using Shared-Memory and Cluster-Based Architectures

Amal Elsayed Aboutabl

doi:10.5539/cis.v6n3p125

Abstract

Achieving interactive response times when searching for documents on the web has become a challenge especially with the tremendous increase in the size of information available nowadays. Incorporating parallelism in search engines is one of the approaches towards achieving this aim. In this paper, we present a model for parallel query processing. Then, this model is extended particularly for usage on shared-memory and cluster parallel architectures. A special simulator, reflecting the proposed model, was developed allowing parameters concerning the data set, queries and architectures to be varied. A total of 32 experiments were conducted and the output was studied for the effect of varying different parameters. A number of performance measures such as average response time, speedup and efficiency are computed to study the effect of varying the parameters. Results show that in terms of average response time, speedup and efficiency, the proposed model for parallel query processing on shared-memory architecture outperforms that on cluster-based architecture.

Highlights

With the dramatic increase in the size of the web, search engines had to scale up to keep up with this growth
We present a model for parallel query processing
A number of performance measures such as average response time, speedup and efficiency are computed to study the effect of varying the parameters

Summary

Introduction

With the dramatic increase in the size of the web, search engines had to scale up to keep up with this growth. In 1997, the top search engines indexed from 2 million to 200 million web documents. Estimates of the number of indexed pages has reached at least 15.78 billion pages as reported by Yahoo, Google and Bing in Daily estimated size of the World Wide Web (2013). There has been a vast increase in the number of internet users due to the fact that the world wide web has become the primary source of information. In 2011, the average number of queries per day reached around 4 billion on Google as reported in Google Annual Search Statistics (2013). We attempt to index the documents in such a way that parallelism is exploited in query processing. We propose an approach that parallelizes query processing using shared-memory and cluster-based architectures

Related Work

Inverted Indices

Document-Based and Term-Based Partitioning

Inverted Index

Inverted Index Partitioning

Using the Shared-Memory Model

Using Clusters

Simulation

Queries and Inverted Index Generator

Generating Queries

Generating Inverted Index

Query Processing on Shared-Memory Simulator

Query Processing on Clusters Simulator

Parameters and Experiments

Results and Discussion

C Average Speedup Efficiency

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computer and Information Science	Publication Date: Aug 1, 2013
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

Exploiting Parallelism in Query Processing for Web Document Search Using Shared-Memory and Cluster-Based Architectures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer and Information Science

Lead the way for us

Similar Papers

Fog Computing Model and Efficient Algorithms for Directional Vehicle Mobility in Vehicular Network
Yalan Wu ... Jigang Wu
Intelligent Transportation Systems, IEEE Transactions on | VOL. 22
Yalan Wu, et. al.Yalan Wu ... Jigang Wu
12 Feb 2020
Intelligent Transportation Systems, IEEE Transactions on | VOL. 22

Cost-effective processing for Delay-sensitive applications in Cloud of Things systems
Yucen Nan ... Wei Li
-
Yucen Nan, et. al.Yucen Nan ... Wei Li
01 Oct 2016
01 Oct 2016

A Global Paradigm for Designing Parallel Relational Data Warehouses in Distributed Environments
Soumia Benkrid ... Ladjel Bellatreche
-
Soumia Benkrid, et. al.Soumia Benkrid ... Ladjel Bellatreche
01 Jan 2014
01 Jan 2014

A dynamic tradeoff data processing framework for delay-sensitive applications in Cloud of Things systems
Yucen Nan ... Albert Y Zomaya
Journal of Parallel and Distributed Computing | VOL. 112
Yucen Nan, et. al.Yucen Nan ... Albert Y Zomaya
16 Oct 2017
Journal of Parallel and Distributed Computing | VOL. 112

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting Parallelism in Query Processing for Web Document Search Using Shared-Memory and Cluster-Based Architectures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer and Information Science