In this paper, a stochastic model of a large distributed system where users’ files are duplicated on unreliable data servers is investigated. Due to a server breakdown, a copy of a file can be lost, it can be retrieved if another copy of the same file is stored on other servers. In the case where no other copy of a given file is present in the network, it is definitively lost. In order to have multiple copies of a given file, it is assumed that each server can devote a fraction of its processing capacity to duplicate files on other servers to enhance the durability of the system. A simplified stochastic model of this network is analyzed. It is assumed that a copy of a given file is lost at some fixed rate and that the initial state is optimal: each file has the maximum number $d$ of copies located on the servers of the network. The capacity of duplication policy is used by the files with the lowest number of copies. Due to random losses, the state of the network is transient and all files will be eventually lost. As a consequence, a transient $d$-dimensional Markov process $(X(t))$ with a unique absorbing state describes the evolution this network. By taking a scaling parameter $N$ related to the number of nodes of the network, a scaling analysis of this process is developed. The asymptotic behavior of $(X(t))$ is analyzed on time scales of the type $t\mapsto N^{p}t$ for $0\leq p\leq d-1$. The paper derives asymptotic results on the decay of the network: Under a stability assumption, the main results state that the critical time scale for the decay of the system is given by $t\mapsto N^{d-1}t$. In particular, the duration of time after which a fixed fraction of files are lost is of the order of $N^{d-1}$. When the stability condition is not satisfied, that is, when it is initially overloaded, it is shown that the state of the network converges to an interesting local equilibrium which is investigated. As a consequence, it sheds some light on the role of the key parameters $\lambda$, the duplication rate and $d$, the maximal number of copies, in the design of these systems. The techniques used involve careful stochastic calculus for Poisson processes, technical estimates and the proof of a stochastic averaging principle.
Read full abstract