Abstract

A multitude of contemporary applications heavily involve graph data whose size appears to be ever–increasing. This trend shows no signs of subsiding and has caused the emergence of a number of distributed graph processing systems including Pregel , Apache Giraph , and GraphX . However, the unprecedented scale now reached by real-world graphs hardens the task of graph processing due to excessive memory demands even for distributed environments. By and large, such contemporary graph processing systems employ ineffective in-memory representations of adjacency lists. Therefore, memory usage patterns emerge as a primary concern in distributed graph processing. We seek to address this challenge by exploiting empirically-observed properties demonstrated by graphs generated by human activity. In this paper, we propose 1) three compressed adjacency list representations that can be applied to any distributed graph processing system, 2) a variable-byte encoded representation of out-edge weights for space-efficient support of weighted graphs, and 3) a tree-based compact out-edge representation that allows for efficient mutations on the graph elements. We experiment with publicly-available graphs whose size reaches two-billion edges and report our findings in terms of both space-efficiency and execution time. Our suggested compact representations do reduce respective memory requirements for accommodating the graph elements up–to 5 times if compared with state-of-the-art methods. At the same time, our memory-optimized methods retain the efficiency of uncompressed structures and enable the execution of algorithms for large scale graphs in settings where contemporary alternative structures fail due to memory errors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call