Abstract

Zuckerli is a scalable compression system meant for large real-world graphs. Graphs are notoriously challenging structures to store efficiently due to their linked nature, which makes it hard to separate them into smaller, compact components. Therefore, effective compression is crucial when dealing with large graphs, which can have billions of nodes and edges. Furthermore, a good compression system should give the user fast and reasonably flexible access to parts of the compressed data without requiring full decompression, which may be unfeasible on their system. Compared to WebGraph, the de-facto standard in compressing real-world graphs, Zuckerli improves multiple aspects by using advanced compression techniques and novel heuristic graph algorithms. Zuckerli can produce both a compressed representation for storage and one which allows fast direct access to the adjacency lists of the compressed graph without decompressing the entire graph. We validate its effectiveness on real-world graphs with up to a billion nodes and 90 billion edges, conducting an extensive experimental evaluation of both compression density and decompression performance. We show that Zuckerli-compressed graphs are 10% to 29% smaller, and more than 20% in most cases, with a resource usage for decompression comparable to that of WebGraph.

Highlights

  • Graph compression essentially boils down to compressing the adjacency lists of a graph G = (V, E), where its nodes are suitably numbered from 1 to n = |V |, and the adjacency list storing the neighbors of each node is seen as the sorted sequence of the corresponding integers from [1, 2, . . . , n]

  • The challenge is to use very few bits per edge and node, so as to squeeze G into as little space as possible. This can make a dramatic difference for massive graphs, if the compressed graph fits into main memory, while its standard representation does not

  • We described Zuckerli, a novel compression algorithm and compressed data structure designed for very large graphs

Read more

Summary

INTRODUCTION

The analysis conducted shows that, while Log(Graph) achieves better performance while performing various operations, the WebGraph framework is still the most competitive approach in terms of compression ratio, especially for web graphs Another well-known approach to graph compression are k2-trees [8], which use a succinct representation of a bidimensional k-tree on the adjacency matrix of the graph. Some other approaches follow a different philosophy, that is, providing access to the compressed graph with a wide range of complex operations, or even a query language, at the cost of sub-optimal compression ratios This is the case for example of ZipG [11], a distributed graph storage system aims at compactly storing a graph, including semantic information on its nodes and edges, while allowing access to this information via a minimal but rich API.

ENCODING INTEGERS
HYBRID INTEGER ENCODING
BRIEF SUMMARY OF WEBGRAPH
ZUCKERLI SCHEME
EXPERIMENTS
DATASETS
Findings
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call