Algorithms for optimal replica placement under correlated failure in hierarchical failure domains

K Alex Mills,R Chandrasekaran,Neeraj Mittal

doi:10.1016/j.tcs.2020.01.004

K Alex Mills, R Chandrasekaran + Show 1 more

Open Access

https://doi.org/10.1016/j.tcs.2020.01.004

Copy DOI

Abstract

In data centers, data replication is the primary method used to ensure availability of customer data. To avoid correlated failure, cloud storage infrastructure providers model hierarchical failure domains using a tree, and avoid placing a large number of data replicas within the same failure domain (i.e. on the same branch of the tree). Typical best practices ensure that replicas are distributed across failure domains, but relatively little is known concerning optimization algorithms for distributing data replicas. Using a hierarchical model, we answer how to distribute replicas across failure domains optimally. We formulate a novel optimization problem for replica placement in data centers. As part of our problem, we formalize and present a new criterion for optimizing a replica placement. Our overall goal is to choose placements in which correlated failures disable as few replicas as possible.In this work, we provide two optimization algorithms for dependency models represented by trees. We first present an O(n+ρlog⁡ρ) time dynamic programming algorithm for optimally placing ρ replicas of a single block on the leaves (representing servers) of a tree with n vertices. We next consider the problem of optimally placing replicas of multiple blocks of data, where every block may have a different replication factor. For this problem, we give a dynamic programming algorithm that runs in O(nρmax3δ2mpoly(δ)), where m denotes the number of blocks, ρmax denotes the maximum replication factor of a block, and δ denotes the maximum difference in the replication factors of any two blocks. The running time of the algorithm is polynomial when the δ, which we refer to as the skew, is a constant.

Full Text