Abstract

How can we analyze large graphs such as the Web, and social networks with hundreds of billions of vertices and edges? Although many graph mining systems have been proposed to perform various graph mining algorithms on such large graphs, they have difficulties in processing Web-scale graphs due to massive communication and I/O costs caused by communication between workers, and reading subgraphs repeatedly. In this paper, we propose FlexGraph, a scalable distributed graph mining method reducing the costs by exploiting properties of real-world graphs. FlexGraph significantly decreases the communication cost, which is the main bottleneck of distributed systems, by exploiting different edge placement policies based on types of vertices. Furthermore, we propose a flexible storage format to reduce I/O costs when reading input graph repeatedly. Experiments show that FlexGraph succeeds in processing up to 64× larger graphs than existing distributed memory-based graph mining methods, and consistently outperforms previous disk-based graph mining methods.

Highlights

  • How can we analyze enormous networks like the Web and social networks which have hundreds of billions of vertices and edges? Graph mining algorithms such as shortest path computation, PageRank, connected component computation, and random walk with restart enable many network analyses

  • FlexGraph: Flexible partitioning and storage for scalable graph mining based on single type of messages, we experimentally show that the method achieves smaller amount of communication cost than those of other methods if we choose proper out-degree threshold θopt

  • We evaluate the scalability of FlexGraph for processing large-scale graphs under two scenarios

Read more

Summary

Introduction

How can we analyze enormous networks like the Web and social networks which have hundreds of billions of vertices and edges? Graph mining algorithms such as shortest path computation, PageRank, connected component computation, and random walk with restart enable many network analyses. Most of the distributed graph mining systems, have problems in handling a very large graph because of massive communication and I/O costs. Disk-based systems such as Hama [18], and PEGASUS [1] increase their scalability by exploiting distributed file systems like HDFS [19] along with local file system of each machine These systems cannot handle very large graphs as they require a lot of communication through network and disk I/Os, which are well-known causes of performance degradation. We propose FlexGraph, a new scalable graph processing method on distributed systems, utilizing real-world graph properties to reduce communication and I/O costs dramatically. We propose FlexGraph, a new scalable distributed graph mining system, which dramatically reduces the communication cost by specially handling high-degree vertices. The codes and datasets used in this paper are publicly available at https://github.com/snudatalab/FlexGraph

Background and related work
Experiments
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call