DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs

Da Zheng,Xiang Song,Chao Ma,Minjie Wang,Jinjing Zhou,Quan Gan,George Karypis,Qidong Su,Zheng Zhang

doi:10.1109/ia351965.2020.00011

Abstract

Graph neural networks (GNN) have shown great success in learning from graph-structured data. They are widely used in various applications, such as recommendation, fraud detection, and search. In these domains, the graphs are typically large, containing hundreds of millions of nodes and several billions of edges. To tackle this challenge, we develop DistDGL, a system for training GNNs in a mini-batch fashion on a cluster of machines. DistDGL is based on the Deep Graph Library (DGL), a popular GNN development framework. DistDGL distributes the graph and its associated data (initial features and embeddings) across the machines and uses this distribution to derive a computational decomposition by following an owner-compute rule. DistDGL follows a synchronous training approach and allows ego-networks forming the mini-batches to include non-local nodes. To minimize the overheads associated with distributed computations, DistDGL uses a high-quality and light-weight min-cut graph partitioning algorithm along with multiple balancing constraints. This allows it to reduce communication overheads and statically balance the computations. It further reduces the communication by replicating halo nodes and by using sparse embedding updates. The combination of these design choices allows DistDGL to train high-quality models while achieving high parallel efficiency and memory scalability. We demonstrate our optimizations on both inductive and transductive GNN models. Our results show that DistDGL achieves linear speedup without compromising model accuracy and requires only 13 seconds to complete a training epoch for a graph with 100 million nodes and 3 billion edges on a cluster with 16 machines.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Scalable and Efficient Full-Graph GNN Training for Large Graphs
Xinchen Wan ... Kaiqiang Xu
Proceedings of the ACM on Management of Data | VOL. 1
Xinchen Wan, et. al.Xinchen Wan ... Kaiqiang Xu
13 Jun 2023
Proceedings of the ACM on Management of Data | VOL. 1

Semi-Supervised Node Classification on Graphs: Markov Random Fields vs. Graph Neural Networks
Binghui Wang ... Neil Zhenqiang Gong
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35
Binghui Wang, et. al.Binghui Wang ... Neil Zhenqiang Gong
18 May 2021
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35

GDLL: A Scalable and Share Nothing Architecture Based Distributed Graph Neural Networks Framework
Duong Thi Thu Van ... Tariq Habib Afridi
IEEE Access | VOL. 10
Duong Thi Thu Van, et. al.Duong Thi Thu Van ... Tariq Habib Afridi
01 Jan 2021
IEEE Access | VOL. 10

Towards Exploring the Limitations of Test Selection Techniques on Graph Neural Networks: An Empirical Study
Xueqi Dang ... Yves Le Traon
Empirical Software Engineering | VOL. 29
Xueqi Dang, et. al.Xueqi Dang ... Yves Le Traon
22 Jul 2024
Empirical Software Engineering | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs

Abstract

Talk to us

Similar Papers