Graph neural networks (GNNs) tend to suffer from high computation costs due to the exponentially increasing scale of graph data and a large number of model parameters, which restricts their utility in practical applications. To this end, some recent works focus on sparsifying GNNs (including graph structures and model parameters) with the lottery ticket hypothesis (LTH) to reduce inference costs while maintaining performance levels. However, the LTH-based methods suffer from two major drawbacks: 1) they require exhaustive and iterative training of dense models, resulting in an extremely large training computation cost, and 2) they only trim graph structures and model parameters but ignore the node feature dimension, where vast redundancy exists. To overcome the above limitations, we propose a comprehensive graph gradual pruning framework termed CGP. This is achieved by designing a during-training graph pruning paradigm to dynamically prune GNNs within one training process. Unlike LTH-based methods, the proposed CGP approach requires no retraining, which significantly reduces the computation costs. Furthermore, we design a cosparsifying strategy to comprehensively trim all the three core elements of GNNs: graph structures, node features, and model parameters. Next, to refine the pruning operation, we introduce a regrowth process into our CGP framework, to reestablish the pruned but important connections. The proposed CGP is evaluated over a node classification task across six GNN architectures, including shallow models [graph convolutional network (GCN) and graph attention network (GAT)], shallow-but-deep-propagation models [simple graph convolution (SGC) and approximate personalized propagation of neural predictions (APPNP)], and deep models [GCN via initial residual and identity mapping (GCNII) and residual GCN (ResGCN)], on a total of 14 real-world graph datasets, including large-scale graph datasets from the challenging Open Graph Benchmark (OGB). Experiments reveal that the proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of the existing methods.
Read full abstract