Abstract
Reducing latency in distributed computing and data storage systems is gaining increasing importance. Several empirical works have reported on the efficacy of scheduling redundant requests in such systems. That is, one may reduce job latency by: 1) scheduling the same job at more than one server and 2) waiting only until the fastest of them responds. Several theoretical models have been proposed to explain the power of using redundant requests, and all of the existing results rely heavily on a common assumption: all redundant requests of a job can be immediately cancelled as soon as one of them is completed. We study how one should schedule redundant requests when such assumption does not hold. This is of great importance in practice, since cancellation of running jobs typically incurs non-negligible delays. In order to bridge the gap between the existing models and practice, we propose a new queueing model that captures such cancellation delays. We then find how one can schedule redundant requests to achieve the optimal average job latency under the new model. Our results show that even with a small cancellation overhead, the actual optimal scheduling policy differs significantly from the optimal scheduling policy when the overhead is zero. Furthermore, we study optimal dynamic scheduling policies, which appropriately schedule redundant requests based on the number of jobs in the system. Our analysis reveals that for the two-server case, the optimal dynamic scheduler can achieve 7%–16% lower average job latency, compared with the optimal static scheduler.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have