The costs for building and using a data network typically depend on the carried load, and one of the primary challenges in managing such networks is the need to reduce peak traffic and achieve temporal load balancing. In this paper we focus on the portion of network traffic generated by the users’ backups, and we address the problem of shifting backup traffic to off-peak hours within the constraints imposed by user connectivity and the distributed nature of the backup protocol. Consider a group of users who need to regularly back up data to a central server. Users can only do a backup when connected to the network, and user connectivity may vary over time. The connectivity is typically correlated to the overall load, which creates a tradeoff between reducing peak load and ensuring regular backups. Backup attempts are initiated locally, without knowledge on the status of other users or the overall network load. The backup activity of connected users is governed by a probabilistic algorithm, characterized by backup probabilities that depend on the hour of the day and the time since the previous backup. This work is part of a joint project between IBM research and the IBM divisions Europe Innovations Team and Integrated Technology Delivery, and aims to design the backup probabilities that minimize the networking costs, while ensuring that the time between successive backups remains small. We have not found any study in the literature on temporal load balancing for distributed backup scheduling. The term load balancing often refers to the practice of distributing workload evenly across multiple workstations or servers to achieve optimal resource utilization. The most relevant study that we are aware of is by Sandnes and Huang [2], in which a temporal load balancing strategy for distributed web applications is proposed and analyzed. The problem of temporal load-balancing is also related to incentivizing commuters to travel at less congested times [1] and peak-shaving in power systems [3]. However, the distributed nature and high delay-tolerance of our setting create unique challenges. The remainder of this extended abstract is structured as follows. In Section 2 we present the model, and we describe our algorithm for deriving the optimal backup probabilities in Section 3. In Section 4 this approach is applied to balance traffic of the data network of an IBM location. We conclude by discussing future research directions in Section 5.
Read full abstract