Abstract

This thesis describes our research results in the context of peer-to-peer information retrieval (P2P-IR). One goal in P2P-IR is to build a search engine for the World Wide Web (WWW) that runs on up to hundreds of thousands or even millions computers distributed all over the world. The idea is not only to distribute the content, e.g., web pages, but also an index for searching this content. The main focus of this thesis lies on designing an overlay network that is capable of transporting data between the different parts of such a distributed search engine. We built a Distributed Hash Table (DHT) that is able to sustain and efficiently handle high traffic loads, which are typically generated by a distributed IR application. We first analyze the behavior of a state-of-the-art DHT under heavy load and show that a DHT can suffer a so-called collapse if it does not have a congestion control mechanism. We propose different ways of integrating congestion control into DHTs to achieve stable behavior of the system under heavy load. We then look into mechanisms for increasing the throughput of a DHT by adapting its routing function to perceived congestion. We propose an algorithm that avoids congested parts of the DHT and thus increases the throughput by exploiting underutilized resources. We evaluate our fully operational DHT prototype using a ModelNet cluster and the PlanetLab testbed to assess the performance of the proposed algorithms. Furthermore, we describe an architecture of a P2P search engine for the WWW. We propose mechanisms to create a highly distributed document index. The main idea is to split the index into very small parts by using so-called highly discriminative keys. We thus achieve an extremely distributed storage of the index, which allows for high parallelism during indexing and querying. We evaluate the performance of our indexing approach with a P2P-IR prototype, which is built on top of our high-performance DHT.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.