FlexGraph: Flexible partitioning and storage for scalable graph mining.

Chiwan Park,Ha-Myung Park,U Kang,Roland Bouffanais

doi:10.1371/journal.pone.0227032

Abstract

How can we analyze large graphs such as the Web, and social networks with hundreds of billions of vertices and edges? Although many graph mining systems have been proposed to perform various graph mining algorithms on such large graphs, they have difficulties in processing Web-scale graphs due to massive communication and I/O costs caused by communication between workers, and reading subgraphs repeatedly. In this paper, we propose FlexGraph, a scalable distributed graph mining method reducing the costs by exploiting properties of real-world graphs. FlexGraph significantly decreases the communication cost, which is the main bottleneck of distributed systems, by exploiting different edge placement policies based on types of vertices. Furthermore, we propose a flexible storage format to reduce I/O costs when reading input graph repeatedly. Experiments show that FlexGraph succeeds in processing up to 64× larger graphs than existing distributed memory-based graph mining methods, and consistently outperforms previous disk-based graph mining methods.

Highlights

How can we analyze enormous networks like the Web and social networks which have hundreds of billions of vertices and edges? Graph mining algorithms such as shortest path computation, PageRank, connected component computation, and random walk with restart enable many network analyses
FlexGraph: Flexible partitioning and storage for scalable graph mining based on single type of messages, we experimentally show that the method achieves smaller amount of communication cost than those of other methods if we choose proper out-degree threshold θopt
We evaluate the scalability of FlexGraph for processing large-scale graphs under two scenarios

Summary

Introduction

How can we analyze enormous networks like the Web and social networks which have hundreds of billions of vertices and edges? Graph mining algorithms such as shortest path computation, PageRank, connected component computation, and random walk with restart enable many network analyses. Most of the distributed graph mining systems, have problems in handling a very large graph because of massive communication and I/O costs. Disk-based systems such as Hama [18], and PEGASUS [1] increase their scalability by exploiting distributed file systems like HDFS [19] along with local file system of each machine These systems cannot handle very large graphs as they require a lot of communication through network and disk I/Os, which are well-known causes of performance degradation. We propose FlexGraph, a new scalable graph processing method on distributed systems, utilizing real-world graph properties to reduce communication and I/O costs dramatically. We propose FlexGraph, a new scalable distributed graph mining system, which dramatically reduces the communication cost by specially handling high-degree vertices. The codes and datasets used in this paper are publicly available at https://github.com/snudatalab/FlexGraph

Background and related work

Experiments

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Jan 24, 2020
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

FlexGraph: Flexible partitioning and storage for scalable graph mining.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Approach to reduce the communication cost when partitioning a big graph
Harif Asmaa ... Marzak Abdelaziz
Procedia computer science | VOL. 220
Harif Asmaa, et. al.Harif Asmaa ... Marzak Abdelaziz
01 Jan 2023
Procedia computer science | VOL. 220

AutoMine
Daniel Mawhirter ... Bo Wu
-
Daniel Mawhirter, et. al.Daniel Mawhirter ... Bo Wu
27 Oct 2019
27 Oct 2019

Large Graph Mining: Recent Developments, Challenges and Potential Solutions
Sabri Skhiri ... Salim Jouili
-
Sabri Skhiri, et. al.Sabri Skhiri ... Salim Jouili
01 Jan 2013
01 Jan 2013

Graph Data Management and Mining: A Survey of Algorithms and Applications
Charu C Aggarwal ... Haixun Wang
-
Charu C Aggarwal, et. al.Charu C Aggarwal ... Haixun Wang
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

FlexGraph: Flexible partitioning and storage for scalable graph mining.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one