Compression of Graphical Structures: Fundamental Limits, Algorithms, and Experiments

Yongwook Choi,Wojciech Szpankowski

doi:10.1109/tit.2011.2173710

Abstract

Information theory traditionally deals with “conventional data,” be it textual data, image, or video data. However, databases of various sorts have come into existence in recent years for storing “unconventional data” including biological data, social data, web data, topographical maps, and medical data. In compressing such data, one must consider two types of information: the information conveyed by the structure itself, and the information conveyed by the data labels implanted in the structure. In this paper, we attempt to address the former problem by studying information of graphical structures (i.e., unlabeled graphs). As the first step, we consider the Erdos-Renyi graphs G(n,p) over n vertices in which edges are added independently and randomly with probability p. We prove that the structural entropy of G(n,p) is (n;2)h(p)-logn!+o(1)=(n;2)h(p)-nlog+O(n), where h(p)=-plogp-(1-p)log(1-p) is the entropy rate of a conventional memoryless binary source. Then, we propose a two-stage compression algorithm that asymptotically achieves the structural entropy up to the nlog term (i.e., the first two leading terms) of the structural entropy. Our algorithm runs either in time O(n2) in the worst case for any graph or in time O(n+e) on average for graphs generated by G(n,p), where e is the average number of edges. To the best of our knowledge, this is the first provable (asymptotically) optimal graph compressor for Erdos-Renyi graph models. We use combinatorial and analytic techniques such as generating functions, Mellin transform, and poissonization to establish these findings. Our experiments confirm the theoretical results and also show the usefulness of our algorithm for some real-world graphs such as the Internet, biological networks, and social networks.

Full Text