SummaryWith the increasing adoption of graph neural networks (GNNs) in the community, various GPU‐based graph programming systems have been developed to improve the productivity of GNNs. However, sampling‐based GNN training is still inefficient, and we observe that the main bottleneck comes from the data transferring, where vertex features are transferred from host memory to GPU through limited bandwidth. In this article, we propose BRGraph, a sampling‐based GNN training system that supports efficient data transferring. BRGraph leverages the duplicate vertices between mini‐batches by batch reusing (BR) strategy to avoid duplicate data transmission. Furthermore, to reduce the overhead of detecting duplicate vertices, we design an efficient parallel batch reusing algorithm based on GPU. BRGraph also exploits the data reusing potential of the non‐duplicate vertex features by the two‐level batch reusing (two‐level BR) strategy. Comprehensive evaluations on three representative GNN models show that BRGraph reduces data transferring time by up to 60% and delivers up to 1.79 GNN training speedup over the state‐of‐the‐art baselines. Besides, it can save GPU memory by up to 40% while reaching the same training time compared with the static cache strategy. When applying the two‐level BR, BRGraph further reduces 20% of the data transferring time compared with the BR.
Read full abstract