Abstract

Search engines require computers with high computation resources for processing to crawl web pages and huge data storage to store billions of pages collected from the World Wide Web after parsing and indexing these pages. The indexer is one of the main components of the search engine that come intermediate between the crawler and the searcher. Indexing is the process of organizing the collected data to facility information retrieval and minimizes the time of query. Indexing requires huge processing and storage resources, and the indexing has a high effect on the performance of the search engine, this effect differs based on the structure and the process index construction. Distribution of the indexing process over a cluster of computers in grid computing will improve the performance through distributing the parsing load over a number of computers in a grid environment, and distributing the indexed data over distributed memory according to terms over a number of computers remotely. Due to the search engine data collections with frequent changes, the indexer require dynamic indexing. So the merge of the distributed and dynamic indexing in architecture over grid computing will give a better performance utilizing the available resources without need to computers with high cost such as supercomputers. General Terms Grid Computing, Algorithms, Inverted Index, and World Wide Web.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call