Abstract

Communications in datacenter jobs (such as the shuffle operations in MapReduce applications) often involve many parallel flows, which may be processed simultaneously. This highly parallel structure presents new scheduling challenges in optimizing job-level performance objectives in data centers. Chowdhury and Stoica introduced the coflow abstraction to capture these communication patterns, and recently Chowdhury et al. developed effective heuristics to schedule coflows. In this paper, we consider the problem of efficiently scheduling coflows with release dates so as to minimize the total weighted completion time, which has been shown to be strongly NP-hard. Our main result is the first polynomial-time deterministic approximation algorithm for this problem, with an approximation ratio of 67/3, and a randomized version of the algorithm, with a ratio of 9+16√2/3. Our results use techniques from both combinatorial scheduling and matching theory, and rely on a clever grouping of coflows. We also run experiments on a Facebook trace to test the practical performance of several algorithms, including our deterministic algorithm. Our experiments suggest that simple algorithms provide effective approximations of the optimal, and that our deterministic algorithm has near-optimal performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call