Randomized Protocols for Duplicate Elimination in Peer-to-Peer Storage Systems

R.A Ferreira,M.K Ramanathan,S Jagannathan,A Grama

doi:10.1109/p2p.2005.30

Abstract

Distributed peer-to-peer storage systems rely on voluntary participation of peers to effectively manage a storage pool. Files are generally replicated in several sites to provide acceptable levels of availability. If disk space on these peers is not carefully monitored and provisioned, the system may not be able to provide availability for certain files. In particular, identification and elimination of redundant data are important problems that may arise in long-lived systems. Scalability and availability are competing goals in these networks: scalability concerns would dictate aggressive elimination of replicas, while availability considerations would argue conversely. In this paper, the authors provided a novel and efficient solution that addresses both these goals with respect to management of redundant data. Specifically, the problem of duplicate elimination in the context of systems connected over an unstructured peer-to-peer network in which there is no a priori binding between an object and its location was addressed. A new randomized protocol was proposed to solve this problem in a scalable and decentralized fashion that does not compromise availability requirements of the application. Performance results using both large-scale simulations, and a prototype built on PlanetLab, demonstrate that the protocols provide high probabilistic guarantees of success, while incurring minimal administrative overheads.

Full Text