Load balancing is a critical problem within storage clusters. Existing algorithms often require high communication overhead, because they have to collect sufficient information that they can then use to dispatch requests for hotspot data fairly. We propose an efficient scheme to achieve approximately optimal load balancing while keeping communication overhead low, namely, self-adaptive replication management (SARM). Our approach estimates the access strength of hotspot data and establishes adequate number of replicas on nodes based on their load conditions. Each node uses a dynamic scheduling algorithm to address requests for hotspot data. If the load conditions of all dispatched nodes exceed the fair load estimate, a minimum scheduling algorithm is used to dispatch the requests; otherwise, a probabilistic scheduling algorithm is adopted instead. In another word, SARM automatically switches the scheduling algorithms according to fair load estimates and the load conditions on nodes. Consequently, it eliminates request burstiness while achieving stable load balancing. To avoid excessive communication overhead, the fair load estimates are updated within a fixed time interval. Moreover, when the load variations in a node exceed a specific threshold, their load conditions are dynamically updated to other nodes. Finally, we also consider data availability in SARM. We present simulations and analysis on the performance of our approach compared with other schemes under a variety of load conditions.
Read full abstract