EER-TO-PEER (P2P) computing has attracted much attention from both the academic community and industry. This is fueled by the successful deployment and adoption of many domain specific P2P systems. For example, Freenet and Gnutella enable users to share any digital files (e.g., music files, video, images), Napster allows sharing of (mp3) music files, ICQ facilitates exchanges of personal messages, SETI@home makes computing cycles of participants available, and LOCKSS pools storage resources to archive document collections. In P2P systems, autonomous peers (computers) are treated as equals, i.e., perform the same functions. They can join and leave the system at any time. These peers pool together their resources (data, storage, computing cycles) to enable new capabilities greater than the sum of the parts. Data can be exchanged between peers directly and underutilized resources can be tapped. The potential of such a highly distributed and decentralized system is tremendous. Interestingly, existing P2P systems lack data management capabilities that are typically found in DBMS. Although research in distributed (and heterogenous) databases has been pursued for many years, the database community has not been as aggressive in enhancing P2P systems with data management capabilities. We would add that the current P2P paradigm offers challenges beyond what has been previously done in the distributed database context. To list a few, the system may scale to over thousands or tens of thousands of peers which existing techniques cannot adequately handle, the dynamism of the system raises issues of information quality (e.g., completeness, consistency) that have not been previously considered, and the trustworthiness of the participating peers poses security threats not seen before. This special section aims to bring together current research activities that address some of these problems. The section contains six papers covering topics on data integration, search, consistency, trust, and identity. We hope this section will whet the appetite of our community to pursue this exciting field further. In a peer-based data management system, it is practically impossible to construct a global schema that mediates semantic differences of shared data across a large number of autonomous peers. The first paper, “The Piazza Peer Data Management System” by Alon Y. Halevy, Zazhary G. Ives, Jayant Madhavan, Peter Mork, Dan Suciu, and Igor Tatarinov, proposes a solution to facilitate ad hoc, decentralized sharing and administration of data, and defining of semantic relationships. Every peer can contribute new data and relate the data to existing concepts and schemas and define new schemas for other peers to use as frame of reference for their queries. The paper also discusses query answering and optimization algorithms.
Read full abstract