Balancing the load in a decentralized P2P system is a challenging problem due to the dynamic nature of such environment and the absence of global knowledge about the actual composition of the system.In this paper, we address the problem of load balancing in large scale and selforganizing P2P systems managing multidimensional data. We propose simple and efficient decentralized mechanisms to evenly distribute the data load among the participating nodes in Content Addressable Networks. The basic idea is to enable a new node that joins the system to share the load with a heavily loaded node which is already in the system, such that the load is still evenly distributed among all the participating nodes. In the multiple random choices method, the new node probes the load of some existing nodes selected uniformly at random, then chooses the heaviest node among them to share the load with. In this paper, we extend this method in three ways. First, the new node probes a pool of nodes proportional to the network size and composition. Specifically, the number of probed nodes is logarithmic to the network size. This property enables to achieve a constant load imbalance factor which is very small without the need to estimate the network size. Second, the probed nodes are not selected at random, but they are well spread over the key space; which enables a good estimation of the actual data distribution and network composition, which enables to cope well with large-scale data imbalance. Third, the selection of nodes to probe is restricted to the immediate and distant neighbors of a randomly chosen node. The cost incurred by our join-based load balancing method is very small, since all load information is piggybacked to periodic maintenance messages exchanged between nodes and their neighbors. Unlike other methods, we do not make use of external index nor assume any global knowledge. We also generalize the first method to enable locating a heavily loaded node through a sequential walk starting from a randomly selected node. This new method incurs additional overhead, however it achieves much smaller load imbalance. We also study the robustness of our join-based load balancing method against adversarial attacks. Using simulation, we analyze the impact of the number of entry points on load balancing. To the best of our knowledge we are the first to address this problem. We conduct an experimental study using uniform and nonuniform data distributions to demonstrate the effectiveness and the scalability of our proposals.
Read full abstract