Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation

Jiong Zhu,Aishwarya Reganti,Edward W Huang,Charles Dickens,Nikhil Rao,Karthik Subbian,Danai Koutra

doi:10.1145/3701563

Abstract

Distributed Graph Neural Network (GNN) training facilitates learning on massive graphs that surpass the storage and computational capabilities of a single machine. Traditional distributed frameworks strive for performance parity with centralized training by maximally recovering cross-instance node dependencies, relying either on inter-instance communication or periodic fallback to centralized training. However, these processes create overhead and constrain the scalability of the framework. In this work, we propose a streamlined framework for distributed GNN training that eliminates these costly operations, yielding improved scalability, convergence speed, and performance over state-of-the-art approaches. Our framework (1) comprises independent trainers that asynchronously learn local models from locally-available parts of the training graph, and (2) synchronizes these local models only through periodic (time-based) model aggregation. Contrary to prevailing belief, our theoretical analysis shows that it is not essential to maximize the recovery of cross-instance node dependencies to achieve performance parity with centralized training. Instead, our framework leverages randomized assignment of nodes or super-nodes (i.e., collections of original nodes) to partition the training graph in order to enhance data uniformity and minimize discrepancies in gradient and loss function across instances. Experiments on social and e-commerce networks with up to 1.3 billion edges show that our proposed framework achieves state-of-the-art performance and 2.31x speedup compared to the fastest baseline, despite using less training data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Knowledge Discovery from Data

Lead the way for us

Similar Papers

Efficient Training of Graph Neural Networks on Large Graphs
Yanyan Shen ... Hongbo Yin
Proceedings of the VLDB Endowment | VOL. 17
Yanyan Shen, et. al.Yanyan Shen ... Hongbo Yin
01 Aug 2024
Proceedings of the VLDB Endowment | VOL. 17

Greedy‐based user selection for federated graph neural networks with limited communication resources
Hancong Huangfu ... Zizhen Zhang
Computational Intelligence | VOL. 40
Hancong Huangfu, et. al.Hancong Huangfu ... Zizhen Zhang
01 Feb 2024
Computational Intelligence | VOL. 40

Diagnosis of dermatophytosis in cats using artificial neural networks
А.А Bushmina ... I.V Kireev
Veterinaria i kormlenie | VOL. -
А.А Bushmina, et. al.А.А Bushmina ... I.V Kireev
01 Feb 2023
Veterinaria i kormlenie | VOL. -

Logarithm-approximate floating-point multiplier is applicable to power-efficient neural network training
Taiyu Cheng ... Masanori Hashimoto
Integration | VOL. 74
Taiyu Cheng, et. al.Taiyu Cheng ... Masanori Hashimoto
14 May 2020
Integration | VOL. 74

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Knowledge Discovery from Data