Abstract

In contrast to time series, graphical data is data indexed by the vertices and edges of a graph. Modern applications such as the internet, social networks, genomics and proteomics generate graphical data, often at large scale. The large scale argues for the need to compress such data for storage and subsequent processing. Since this data might have several components available in different locations, it is also important to study distributed compression of graphical data. In this paper, we derive a rate region for this problem which is a counterpart of the Slepian–Wolf theorem. We characterize the rate region when the statistical description of the distributed graphical data can be modeled as being one of two types – as a member of a sequence of marked sparse Erdős–Rényi ensembles or as a member of a sequence of marked configuration model ensembles. Our results are in terms of a generalization of the notion of entropy introduced by Bordenave and Caputo in the study of local weak limits of sparse graphs. Furthermore, we give a generalization of this result for Erdős–Rényi and configuration model ensembles with more than two sources.

Highlights

  • Nowadays, storing and processing data that in its native form is indexed by combinatorial objects other than just linearly ordered time or multidimensional arrays is of great importance in many applications such as the internet, social networks and biology

  • A social network could be presented as a graph where each vertex models an individual and each edge stands for a friendship

  • If R is the rate region for the sequence of ensembles corresponding to μ1,2, as defined in Section II, a rate tuple (α1, R1, α2, R2) ∈ R if and only if (α1, R1) ((d1,2 − d2)/2, Σ(μ1|μ2)), (17a)

Read more

Summary

INTRODUCTION

Nowadays, storing and processing data that in its native form is indexed by combinatorial objects other than just linearly ordered time or multidimensional arrays is of great importance in many applications such as the internet, social networks and biology. Compression is modeled using two (or more) possibly dependent jointly stationary and ergodic processes representing the components of the data at the individual locations In this case, the rate region, which characterizes how efficiently the data can be compressed, is given by the Slepian–Wolf theorem [7]. To the highest order, the marked BC entropy captures the part of the overall entropy that truly depends on the empirical characteristics of the graphical data and not just on the underlying connectivity structure of the graph This motivates the marked BC entropy as a natural measure governing the asymptotic compression bounds, since it is sensitive to the details of the statistics of the ensembles and scales linearly with the number of vertices of the underlying graph.

PROBLEM STATEMENT
THE FRAMEWORK OF LOCAL WEAK CONVERGENCE
THE BC ENTROPY
MAIN RESULTS
Proof of Achievability for the Erdos–Renyi case
Proof of Achievability for the Configuration model
Proof of the Converse for the Erdos–Renyi case
Proof of the Converse for the Configuration Model
Generalization to more than two sources
CONCLUSION
Proof of converse
Proof of achievability for the Erdos–Renyi ensemble
Proof of achievability for the configuration model ensemble
Local Weak Limit of the Sequence of Erdos–Renyi Ensembles
Local Weak Limit of the Sequence of Configuration Model Ensembles
Alternating Red-Blue Regular Rooted Tree
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.