Abstract

We consider the data shuffling problem, in which a master node is connected to a set of worker nodes, via a shared link, in order to communicate a set of files to the worker nodes. The master node has access to a database of files. In every shuffling iteration, each worker node processes a new subset of files, and has excess storage to partially cache the remaining files. We characterize the exact rate-memory trade-off for the worst-case shuffling under the assumption that cached files are uncoded, by deriving the minimum communication rate for a given storage capacity per worker node. As a byproduct, the exact rate-memory trade-off for any random shuffling is characterized when the number of files is equal to the number of worker nodes. We propose a novel deterministic and systematic coded shuffling scheme, which improves the state of the art. Then, we prove the optimality of our proposed scheme by deriving a matching lower bound and showing that the placement phase of the proposed coded shuffling scheme is optimal over all shuffles.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call