Abstract

Distributed applications usually feature a set of correlated flows between two consecutive computation stages. The scheduling of these flows has a crucial influence on job completion time. Coflow improves performance by optimizing the finish time of the entire set of flows. However, the flows and computing tasks in one application have more complex relationships that exceed the coflow's barrier assumption. In this context, scheduling via coflow abstraction may hurt application performance. Accordingly, we propose metaflow, a traffic abstraction derived from the computation graph of the application. Metaflow reveals the detailed flow requirements of the application and makes it easier to reduce the job completion time. Based on the metaflow, we first develop a mathematical model and formulate the scheduling problem as an integer linear programming (ILP) problem. We further prove that it has an equivalent linear programming (LP) problem through rigorous theoretical analysis in order to solve this ILP problem efficiently. To demonstrate the effectiveness of scheduling with metaflow, we have conducted extensive simulations with both synthetic single jobs and production traces containing multiple jobs. The simulation results verify that our new scheduler adapts well to different jobs and can achieve a significant increase in an average speed of 2.87× on a real-life workload, compared to the state-of-the-art coflow scheduler.

Highlights

  • Datacenter networks are critical to the performance of distributed applications

  • We propose an algorithm to calculate the metaflow completion time (MCT) and successfully formulate the metaflow scheduling problem (MSP) expressed as an integer linear programming (ILP) model with optimal solutions. (§IV)

  • Using workload traces from real datacenters, we show that the distributed applications can be boosted significantly through network scheduling with metaflow, compared with the state-of-the-art coflow scheduler (§VI)

Read more

Summary

INTRODUCTION

Datacenter networks are critical to the performance of distributed applications. It is reported that, at times, 50% of the time taken to complete a job is spent on transferring data across the networks [1]. Traditional scheduling algorithms focus on reducing flow completion time (FCT) [3]–[6] or improving per-flow fairness [7], [8] Since they are based on the abstraction of flows, they cannot capture the semantics of communication in a distributed application; the optimization of flow-level objectives can be at odds with application-level goals. Coflow assumes that a job cannot begin to process the stage until all flows within the coflow have finished; that is to say, a barrier exists between two consecutive stages Under this condition, minimizing the average CCT usually aligns application-level performance, thereby decreasing job completion time (JCT). Coflow can not convey these application semantics to the network controller To address this problem, in this paper we propose metaflow, a new application-oriented traffic abstraction that leverages the computation dependency graph to guide the network transfer. We verify the performance of metaflow scheduler with other three schedulers in extensive experiments

RELATED WORK
METAFLOW SCHEDULING PROBLEM
TRANSFORMATION INTO A NONLINEAR PROGRAMMING PROBLEM
TRANSFORMING THE NONLINEAR PROGRAMMING PROBLEM INTO AN LP
EXPERIMENTAL EVALUATION
COMPUTATIONAL COST
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.