Abstract

In data-parallel computing frameworks, intermediate parallel data is often produced at various stages which needs to be transferred among servers in the datacenter network (e.g., the shuffle phase in MapReduce). A stage often cannot start or be completed unless all the required data pieces from the preceding stage are received. Coflow is a recently proposed networking abstraction to capture such communication patterns. We consider the problem of efficiently scheduling coflows with release dates in a shared datacenter network so as to minimize the total weighted completion time of coflows. Several heuristics have been proposed recently to address this problem, as well as a few polynomial-time approximation algorithms with provable performance guarantees. Our main result in this paper is a polynomial-time deterministic algorithm that improves the prior known results. Specifically, we propose a deterministic algorithm with approximation ratio of 5, which improves the prior best known ratio of 12. For the special case when all coflows are released at time zero, our deterministic algorithm obtains approximation ratio of 4 which improves the prior best known ratio of 8. The key ingredient of our approach is an improved linear program formulation for sorting the coflows followed by a simple list scheduling policy. Extensive simulation results, using both synthetic and real traffic traces, are presented that verify the performance of our algorithm and show improvement over the prior approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call