Abstract

The Map-Reduce computing framework rose to prominence with datasets of such size that dozens of machines on a single cluster were needed for individual jobs. As datasets approach the exabyte scale, a single job may need distributed processing not only on multiple machines, but on multiple clusters. We consider a scheduling problem to minimize weighted average completion time of n jobs on m distributed clusters of parallel machines. In keeping with the scale of the problems motivating this work, we assume that (1) each job is divided into m “subjobs” and (2) distinct subjobs of a given job may be processed concurrently. When each cluster is a single machine, this is the NP-Hard concurrent open shop problem. A clear limitation of such a model is that a serial processing assumption sidesteps the issue of how different tasks of a given subjob might be processed in parallel. Our algorithms explicitly model clusters as pools of resources and effectively overcome this issue. Under a variety of parameter settings, we develop two constant factor approximation algorithms for this problem. The first algorithm uses an LP relaxation tailored to this problem from prior work. This LP-based algorithm provides strong performance guarantees. Our second algorithm exploits a surprisingly simple mapping to the special case of one machine per cluster. This mapping-based algorithm is combinatorial and extremely fast. These are the first constant factor approximations for this problem.

Highlights

  • It is becoming increasingly impractical to store full copies of large datasets on more than one data center [7]

  • Hung et al modeled each cluster as having an arbitrary number of identical parallel machines, and choose an objective of average job completion time

  • Hung et al proposed a particular algorithm for the controller called “SWAG.” SWAG performed well in a wide variety of simulations where each data center was assumed to have the same number of identical parallel machines

Read more

Summary

Introduction

It is becoming increasingly impractical to store full copies of large datasets on more than one data center [7]. Commercial platforms such as AWS Lambda and Microsoft’s Azure Service Fabric are demonstrating a trend of centralized cloud computing frameworks in which the user manages neither data flow nor server allocation [1, 11] In view of these converging issues, the following scheduling problem arises: If computation is done locally to avoid excessive network traffic, how can individual clusters on the broader grid coordinate schedules for maximum throughput?. Hung et al modeled each cluster as having an arbitrary number of identical parallel machines, and choose an objective of average job completion time. As such a problem generalizes the NP-Hard concurrent open shop problem, they proposed a heuristic approach. E.g., a 2-approximation when machines are of unit speed and subjobs are divided into sized (but not necessary unit) tasks

Formal Problem Statement
Example Problem Instances
Related Work
A permutation of the author’s names
The Core Linear Program
Statement of LP1
Proof of LP1’s Validity
Theoretical Complexity of LP1
List Scheduling from Permutations
An LP-based Algorithm
CC-LP for Uniform Machines
CC-LP for Identical Machines
Combinatorial Algorithms
A Degenerate Case for SWAG
CC-TSPT with Unit Tasks and Identical Machines
CC-ATSPT : Augmenting the LP Relaxation
Closing Remarks
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call