Graph Neural Networks (GNNs) have been widely used in graph analysis due to their strong performance on a wide variety of tasks. Unfortunately, as the size of graphs keeps growing, large graphs can easily consume Terabytes, and training on such graphs may take days. The high memory footprint limits the usage of GNNs on resource-constrained devices like smartphones and IoT devices. Hence, reducing the storage cost, training time, and inference latency is highly desirable. In this work, we apply Product Quantization (PQ) on GNNs for the first time to achieve superior memory capacity reduction. To alleviate the processing burden caused by PQ and improve compression performance, we propose Enhanced Product Quantization (EPQ). It reduces the input graph data, which tends to dominate memory consumption, and accelerates clustering in PQ. Moreover, an efficient quantization framework11Github repository:https://github.com/Lyun-Huang/EPQuant for GNNs is proposed, which combines EPQ with Scalar Quantization (SQ), to achieve improved compression performance and computation acceleration on off-the-shelf hardware, enabling the deployment of GNNs on resource-constrained devices. In addition, the proposed quantization framework can be applied to the existing GNNs without too much porting effort. Extensive experimental results show that the proposed quantization scheme can achieve 321.26× and 184.03× memory capacity compression for input graph data and overall storage, respectively. This impressive memory reduction can come with an accuracy loss of less than 1%.
Read full abstract