7-days of FREE Audio papers, translation & more with Prime
7-days of FREE Prime access
7-days of FREE Audio papers, translation & more with Prime
7-days of FREE Prime access
https://doi.org/10.14778/3681954.3681968
Copy DOIJournal: Proceedings of the VLDB Endowment | Publication Date: Jul 1, 2024 |
Training GNNs over large graphs faces a severe data processing bottleneck, involving both sampling and feature loading. To tackle this issue, we introduce F 2 CGT, a fast GNN training system incorporating feature compression. To avoid potential accuracy degradation, we propose a two-level, hybrid feature compression approach that applies different compression methods to various graph nodes. This differentiated choice strikes a balance between rounding errors, compression ratios, model accuracy loss, and preprocessing costs. Our theoretical analysis proves that this approach offers convergence and comparable model accuracy as the conventional training without feature compression. Additionally, we also co-design the on-GPU cache sub-system with compression-enabled training within F 2 CGT. The new cache sub-system, driven by a cost model, runs new cache policies to carefully choose graph nodes with high access frequencies, and well partitions the spare GPU memory for various types of graph data, for improving cache hit rates. Finally, extensive evaluation of F 2 CGT on two popular GNN models and four datasets, including three large public datasets, demonstrates that F 2 CGT achieves a compression ratio of up to 128 and provides GNN training speedups of 1.23-2.56× and 3.58--71.46× for single-machine and distributed training, respectively, with up to 32 GPUs and marginal accuracy loss.
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.