Abstract
To minimize a job's completion time, we need to minimize the completion time of its final stage's last task. Scheduling of machine slots and networks largely dominates the variable part of each task's duration. Finding an optimal schedule is NP-hard even for offline and simplified scenarios. Previous work does lead to improved performance with various strategies. State-of-the-art task placement and network scheduling efforts are largely disjunctive. Without joint optimization, they are sub-optimal and myopic in many scenarios. Task placement usually treats the network as a black box. Thus, we use prioritized bandwidth allocation among tasks making the network both <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">predictable</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">efficient</i> to achieve joint scheduling. With this feature, joint scheduling can be transformed into a special <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">bin-packing problem</i> . Over this minimal yet power-enough abstraction, we propose PushBox to schedule data-parallel jobs in multi-tenant clusters. When designing the joint scheduling algorithm, we not only embrace the wisdom of prior art but also respect administrators’ fairness intent, which is so far largely ignored. We implement PushBox on Hadoop 3. PushBox performs persistently well on both a small testbed and a trace-driven simulator.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Parallel and Distributed Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.