Applications in data-parallel computing typically consist of multiple stages. In each stage, a set of intermediate parallel data flows ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Coflow</i> ) is produced and transferred between servers to enable starting of next stage. While there has been much research on scheduling isolated coflows, the dependency between coflows in multi-stage jobs has been largely ignored. In this paper, we consider scheduling coflows of multi-stage jobs represented by general <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">DAG</i> s (Directed Acyclic Graphs) in a shared data center network, so as to minimize the total weighted completion time of jobs. This problem is significantly more challenging than the traditional coflow scheduling, as scheduling even a single multi-stage job to minimize its completion time is shown to be NP-hard. In this paper, we propose a polynomial-time algorithm with approximation ratio of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$O(\mu \log (m)/\log (\log (m)))$ </tex-math></inline-formula> , where <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\mu $ </tex-math></inline-formula> is the maximum number of coflows in a job and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$m$ </tex-math></inline-formula> is the number of servers. For the special case that the jobs’ underlying dependency graphs are <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">rooted trees</i> , we modify the algorithm and improve its approximation ratio. To verify the performance of our algorithms, we present simulation results using real traffic traces that show up to 53% improvement over the prior approach. We conclude the paper by providing a result concerning an optimality gap for scheduling coflows with general DAGs.
Read full abstract