DUCATI: A Dual-Cache Training System for Graph Neural Networks on Giant Graphs with the GPU

Xin Zhang,Yingxia Shao,Yanyan Shen,Lei Chen

doi:10.1145/3589311

Abstract

Recently Graph Neural Networks (GNNs) have achieved great success in many applications. The mini-batch training has become the de-facto way to train GNNs on giant graphs. However, the mini-batch generation task is extremely expensive which slows down the whole training process. Researchers have proposed several solutions to accelerate the mini-batch generation, however, they (1) fail to exploit the locality of the adjacency matrix, (2) cannot fully utilize the GPU memory, and (3) suffer from the poor adaptability to diverse workloads. In this work, we propose DUCATI, aDual-Cache system to overcome these drawbacks. In addition to the traditionalNfeat-Cache, DUCATI introduces a newAdj-Cache to further accelerate the mini-batch generation and better utilize GPU memory. DUCATI develops a workload-awareDual-Cache Allocator which adaptively finds the best cache allocation plan under different settings. We compare DUCATI with various GNN training systems on four billion-scale graphs under diverse workload settings. The experimental results show that in terms of training time, DUCATI can achieve up to 3.33 times speedup (2.07 times on average) compared to DGL and up to 1.54 times speedup (1.32 times on average) compared to the state-of-the-artSingle-Cache systems. We also analyze the time-accuracy trade-offs of DUCATI and four state-of-the-art GNN training systems. The analysis results offer users some guidelines on system selection regarding different input sizes and hardware resources.

Full Text