Resource pooling is becoming increasingly common in modern applications of stochastic systems, such as in computer systems, wireless networks, workforce management, call centers, and health care delivery. At the same time, these applications give rise to systems which continue to grow in size. For instance, a traditional web server farm only has a few servers, while cloud data centers have thousands of processors. These two trends pose significant practical restrictions on admission, routing, and scheduling decision rules or algorithms. Scalability and computability are becoming ever more important characteristics of decision rules, and consequently simple decision rules with good performance are of particular interest. An example is the so-called least connection rule implemented in many load balancers in computer clouds, which assigns a task to the server with the least number of active connections; cf. the join-the-shortestqueue routing policy. From a design point of view, the search for desirable algorithmic features often presents trade-offs between system performance, information/communication, and required computational effort. In this paper, we study the trade-off between performance and computational effort in a stylized model of a system with a central server and large number of parallel buffers. We focus on randomized versions of the longest-queue-first scheduling policy. In this scheduling algorithm, the server works on a task from the buffer with the longest queue length among several sampled buffers; it approximates the longestqueue-first scheduling policy, which can be computationally prohibitive. It is our aim to quantify the system performance as a function of the computational effort expended on sampling. In our model, each buffer is fed with an independent stream of tasks, which arrive according to a Poisson process. All n buffers are connected to a single centralized server. Under the randomized longest-queue-first policy, this server selects d(n) buffers uniformly at random (with replacement) and processes a task from the longest queue among the selected buffers; it idles for a random amount of time if all buffers in the sample are empty. Tasks have random processing time requirements. The total processing capacity scales linearly with n and the processing time distribution is independent of n. We work in an underloaded regime, with enough processing capacity to eventually serve all arriving tasks. Note that this scheduling algorithm is agnostic in the
Read full abstract