Abstract

In this survey we provide an overview of recent advances on scalable load balancing schemes which provide favorable delay performance and yet require minimal implementation overhead. The basic load balancing scenario involves a single dispatcher where tasks arrive that must immediately be forwarded to one of $N$ single-server queues. The join-the-shortest-queue (JSQ) policy yields vanishing delays as $N$ grows large, as in a centralized queuing arrangement, but involves a prohibitive communication burden. In contrast, JSQ($d$) schemes that assign an incoming task to a server with the shortest queue among $d$ servers selected uniformly at random require little communication, but lead to constant delays. In order to examine this fundamental trade-off between delay performance and implementation overhead, we discuss a body of recent research on JSQ($d(N)$) schemes in which the diversity parameter $d(N)$ depends on $N$ and investigate the growth rate of $d(N)$ required to match the optimal JSQ performance on fluid and diffusion scales. Stochastic coupling techniques and scaling limits play an instrumental role in establishing this asymptotic optimality. We demonstrate how this methodology carries over to infinite-server settings, finite buffers, multiple dispatchers, servers arranged on graph topologies, and token-based load balancing schemes such as join-the-idle-queue (JIQ), thus providing a broad overview of the main trends in the field.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call