GDLL: A Scalable and Share Nothing Architecture Based Distributed Graph Neural Networks Framework

Duong Thi Thu Van,Young-Koo Lee,Irfan Ullah,Muhammad Numan Khan,Tariq Habib Afridi,Aftab Alam

doi:10.1109/access.2022.3148126

Duong Thi Thu Van, Young-Koo Lee + Show 4 more

Open Access

https://doi.org/10.1109/access.2022.3148126

Copy DOI

Abstract

Deep learning has recently been shown to be effective in uncovering hidden patterns in non-Euclidean space, where data is represented as graphs with complex object relationships and interdependencies. Because of the implicit data dependence in the big graphs with millions of nodes and billions of edges, it is hard for industrial communities to exploit these methods to address real-world challenges at scale. The skewness property of big graphs, distributed file system performance penalty on small k-hop neighborhood subgraphs, and varying size of subgraph makes Graph Neural Networks (GNNs) training further challenging in a distributed environment using parameter servers. To address such issues, we propose a scalable, layered, fault-tolerance, and in-memory distributed computing-based graph neural network framework called Graph Distributed Learning Library (GDLL). The base layer utilizes an optimized distributed file system and a scalable graph data store to reduce the performance penalty. The second layer provides distributed graph processing using in-memory graph programming models while optimizing and hiding the underlying complexity of information complete subgraph computation. In the third layer, GNN modules are deployed on top of the first two layers for efficient distributed training using parameter servers. Finally, we evaluate and compare GDLL with the state-of-the-art solutions and outperform it significantly in terms of efficiency while maintaining similar GNN convergence.

Highlights

G RAPH topologies may naturally represent real-world data in a variety of applications
Existing frameworks focus more on the training of graph learning models but overlook system integrity and generalizability. To address such issues and to fill the research gap, in this paper, we present Graph Distributed Learning Library (GDLL), a scalable, layered, fault-tolerance, in-memory, and shared-nothing architecture-based framework for distributed Graph Neural Networkss (GNNs) training
The proposed framework is composed of three layers, i.e., Graph Data Layer (GDL), Graph Optimization Layer (GOL), and Graph Learning Layer (GLL)

Summary

INTRODUCTION

G RAPH topologies may naturally represent real-world data in a variety of applications. AliGraph implements distributed inmemory graph storage engine, which requires standalone deployment before training a GNN model In the latter class, NeuGraph [12] is based on the Scatter Apply Gather [13] graph processing model. Existing frameworks focus more on the training of graph learning models but overlook system integrity and generalizability To address such issues and to fill the research gap, in this paper, we present Graph Distributed Learning Library (GDLL), a scalable, layered, fault-tolerance, in-memory, and shared-nothing architecture-based framework for distributed GNNs training. The second layer (section IV-B) is called GOL, which provides distributed graph processing on top of an in-memory MapReduce framework (Apache Spark [15]) while optimizing and hiding the underlying complexity of advanced message passing, k-hop-based subgraph computation, and graph sampling techniques. GNNs. We implement the proposed GDLL framework and conduct extensive experiments to validate our claims

RELATED WORK

NOTATIONS

GNNS AS MESSAGE PASSING

C Current State

K-HOP NEIGHBORHOOD

PROPOSED GDLL FRAMEWORK

GRAPH DATA LAYER

K-Hop based subgraph computation

GDLL LIBRARY AND SCENARIOS

GDLL RESULTS AND EVALUATION

DATASET

DISTRIBUTED GNN TRAINING

Methods

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

GDLL: A Scalable and Share Nothing Architecture Based Distributed Graph Neural Networks Framework

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2022
License type: CC BY 4.0

Similar Papers

AGL
Dalong Zhang ... Jun Zhou
Proceedings of the VLDB Endowment | VOL. 13
Dalong Zhang, et. al.Dalong Zhang ... Jun Zhou
01 Aug 2020
Proceedings of the VLDB Endowment | VOL. 13

Intelligent Resource Allocation in Joint Radar-Communication With Graph Neural Networks
Joash Lee ... Dusit Niyato
IEEE Transactions on Vehicular Technology | VOL. 71
Joash Lee, et. al.Joash Lee ... Dusit Niyato
01 Oct 2022
IEEE Transactions on Vehicular Technology | VOL. 71

Distributed Graph Processing: Techniques and Systems
Yanfeng Zhang ... Qiange Wang
-
Yanfeng Zhang, et. al.Yanfeng Zhang ... Qiange Wang
01 Jan 2020
01 Jan 2020

Scalable and Efficient Full-Graph GNN Training for Large Graphs
Xinchen Wan ... Kaiqiang Xu
Proceedings of the ACM on management of data | VOL. 1
Xinchen Wan, et. al.Xinchen Wan ... Kaiqiang Xu
13 Jun 2023
Proceedings of the ACM on management of data | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GDLL: A Scalable and Share Nothing Architecture Based Distributed Graph Neural Networks Framework

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions