Robust and Probabilistic Failure-Aware Placement

Madhukar Korupolu,Rajmohan Rajaraman

doi:10.1145/3210367

Abstract

Motivated by the growing complexity and heterogeneity of modern data centers, and the prevalence of commodity component failures, this article studies the failure-aware placement problem of placing tasks of a parallel job on machines in the data center with the goal of increasing availability. We consider two models of failures: adversarial and probabilistic. In the adversarial model, each node has a weight (higher weight implying higher reliability) and the adversary can remove any subset of nodes of total weight at most a given bound W and our goal is to find a placement that incurs the least disruption against such an adversary. In the probabilistic model, each node has a probability of failure and we need to find a placement that maximizes the probability that at least K out of N tasks survive at any time. For adversarial failures, we first show that (i) the problems are in Σ 2 , the second level of the polynomial hierarchy; (ii) a variant of the problem that we call R obust F ap (for Robust Failure-Aware Placement) is co-NP-hard; and (iii) an all-or-nothing version of R obust F ap is Σ 2 -complete. We then give a polynomial-time approximation scheme (PTAS) for R obust F ap , a key ingredient of which is a solution that we design for a fractional version of R obust F ap . We then study H ier R obust F ap , which is the fractional R obust F ap problem over a hierarchical network, in which failures can occur at any subset of nodes in the hierarchy, and a failure at a node can adversely impact all of its descendants in the hierarchy. To solve H ier R obust F ap , we introduce a notion of hierarchical max-min fairness and a novel Generalized Spreading algorithm, which is simultaneously optimal for every upper bound W on the total weight of nodes that an adversary can fail. These generalize the classical notion of max-min fairness to work with nodes of differing capacities, differing reliability weights, and hierarchical structures. Using randomized rounding, we extend this to give an algorithm for integral H ier R obust F ap . For the probabilistic version, we first give an algorithm that achieves an additive ϵ approximation in the failure probability for the single level version, called P rob F ap , while giving up a (1 + ϵ) multiplicative factor in the number of failures. We then extend the result to the hierarchical version, H ier P rob F ap , achieving an ϵ additive approximation in failure probability while giving up an (L + ϵ) multiplicative factor in the number of failures, where L is the number of levels in the hierarchy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Robust and Probabilistic Failure-Aware Placement

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Parallel Computing

Lead the way for us

Journal: ACM Transactions on Parallel Computing	Publication Date: Mar 31, 2018
Citations: 3

Similar Papers

Robust and Probabilistic Failure-Aware Placement
Madhukar Korupolu ... Rajmohan Rajaraman
-
Madhukar Korupolu, et. al.Madhukar Korupolu ... Rajmohan Rajaraman
11 Jul 2016
11 Jul 2016

Adversarial Models for Priority-Based Networks
C Àlvarez ... J Díaz
-
C Àlvarez, et. al.C Àlvarez ... J Díaz
01 Jan 2003
01 Jan 2003

Adversarial models for priority‐based networks
C Àlvarez ... M Serna
Networks | VOL. 45
C Àlvarez, et. al.C Àlvarez ... M Serna
18 Nov 2004
Networks | VOL. 45

Minimizing the Weighted Number of Late Jobs with Batch Setup Times and Delivery Costs on a Single Machine
George Steiner ... Rui Zhang
-
George Steiner, et. al.George Steiner ... Rui Zhang
01 Dec 2007
01 Dec 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robust and Probabilistic Failure-Aware Placement

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Parallel Computing