Coordinative Scheduling of Computation and Communication in Data-parallel Systems

Dongsheng Li,Yiming Zhang,Zhiyao Hu,Kai Lu,Zhiquan Lai

doi:10.1109/tc.2020.3039238

Abstract

For many data-parallel computing systems like Spark, a job usually consists of multiple computation stages and inter-stage communication (i.e., coflows). Many efforts have been done to schedule coflows and jobs independently. The simple combination of coflow scheduling and job scheduling, however, would prolong the average job completion time (JCT) due to the conflict. For this reason, we propose a new abstraction of scheduling unit, named coBranch, which takes the dependency between computation stages and coflows into consideration, to schedule coflows and jobs jointly. Besides, mainstream coflow schedulers are order-preserving, i.e., all coflows of a high-priority job are prioritized than those of a low-priority job. We observe that the order-preserving constraint incurs low inter-job parallelism. To overcome the problem, we employ an urgency-based mechanism to schedule coBranches, which aims to decrease the average JCT by enhancing the inter-job parallelism. We implement the urgency-based coBranch Scheduling (BS) method on Apache Spark, conduct prototype-based experiments, and evaluate the performance of our method against the shortest-job-first critical-path method and the FIFO method. Results show that our method achieves around 10 and 15 percent reduction in the average JCT, respectively. Large-scale simulations based on the Google trace show that our method performs better and reduces JCT by 23 and 35 percent, respectively.

Highlights

To accelerate big data analytics, data-parallel frameworks such as Dryad [2], Hadoop [3] and Spark [4] partition large input data so that multiple computers process different data partitions concurrently
We propose the based coBranch Scheduling (BS) method to coordinately schedule the transmission of coflows with the execution of jobs, for decreasing the average job completion time (JCT)
Simulation on the average JCT: We evaluate the performance of three methods via the trace replay. 5000 jobs are submitted within 600s and assigned to 500 machines

Summary

INTRODUCTION

To accelerate big data analytics, data-parallel frameworks such as Dryad [2], Hadoop [3] and Spark [4] partition large input data so that multiple computers process different data partitions concurrently. Computation tasks of Job can be executed after high-priority coflows of other jobs are transmitted This will prolong the JCT of Job. As a result of the urgency-based scheduling mechanism, coBranches with high urgency should be prioritized they may belong to different jobs We propose the distributed flow scheduling method to coordinate the transmission of coflows with the execution of coBranches. Given a batch of jobs, the online BS method does not determine the priorities of all coBranches at once but updates the time-varying urgency during the execution of jobs and continuously makes scheduling decisions.

Job scheduling for data-parallel jobs

Coflow scheduling

Motivation of the coordinative scheduling mechanism

Problem formulation

Result

Model of the coBranch duration

URGENCY-BASED COBRANCH SCHEDULING METHOD

Motivation of urgency-based coBranch scheduling

Time-varying coBranch urgency

Exceeding time minimization

Overview

Distributed flow scheduling

Approximation ratio of the BS method

Performance improvement of the DFS method

Online BS method

21: Replace Btaken with Bnew

IMPLEMENTATION

PERFORMANCE EVALUATION

Experiment evaluation on the performance

Experiment evaluation on the JCT improvement

Details

Experiment evaluation on system overheads

30 BS online BS

Experiment evaluation on the online performance

Evaluation on the prediction accuracy

Simulation on Google cluster data

Analysis

V 5HDO7ime gap PV -log5

Simulations

CONCLUSION

Simulation on Facebook coflow data

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Computers	Publication Date: Jan 1, 2020
Citations: 1	License type: CC BY 4.0

R Discovery Prime

Coordinative Scheduling of Computation and Communication in Data-parallel Systems

Abstract

Highlights

Summary

Published Version

Talk to us

Similar Papers

More From: IEEE Transactions on Computers

Lead the way for us

Similar Papers

Branch scheduling
Zhiyao Hu ... Dongsheng Li
-
Zhiyao Hu, et. al.Zhiyao Hu ... Dongsheng Li
24 Jun 2019
24 Jun 2019

Efficient Coflow Scheduling of Multi-Stage Jobs with Isolation Guarantee
Zifan Liu ... Wajid Rafique
-
Zifan Liu, et. al.Zifan Liu ... Wajid Rafique
01 Dec 2018
01 Dec 2018

Improved heuristic job scheduling method to enhance throughput for big data analytics
Zhiyao Hu ... Dongsheng Li
Tsinghua Science and Technology | VOL. 27
Zhiyao Hu, et. al.Zhiyao Hu ... Dongsheng Li
01 Apr 2022
Tsinghua Science and Technology | VOL. 27

Online Job Dispatching and Scheduling to Minimize Job Completion Time and to Meet Deadlines
Yupeng Li
Journal of Interconnection Networks | VOL. 18
Yupeng LiYupeng Li
01 Dec 2018
Journal of Interconnection Networks | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Coordinative Scheduling of Computation and Communication in Data-parallel Systems

Abstract

Highlights

Summary

Published Version

Talk to us

Similar Papers

More From: IEEE Transactions on Computers