On Representation Knowledge Distillation for Graph Neural Networks.

Chaitanya K Joshi,Xu Xun,Jie Lin,Fayao Liu,Chuan Sheng Foo

doi:10.1109/tnnls.2022.3223018

Chaitanya K Joshi, Xu Xun + Show 3 more

Open Access

PDF Available

https://doi.org/10.1109/tnnls.2022.3223018

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Knowledge distillation (KD) is a learning paradigm for boosting resource-efficient graph neural networks (GNNs) using more expressive yet cumbersome teacher models. Past work on distillation for GNNs proposed the local structure preserving (LSP) loss, which matches local structural relationships defined over edges across the student and teacher's node embeddings. This article studies whether preserving the global topology of how the teacher embeds graph data can be a more effective distillation objective for GNNs, as real-world graphs often contain latent interactions and noisy edges. We propose graph contrastive representation distillation (G-CRD), which uses contrastive learning to implicitly preserve global topology by aligning the student node embeddings to those of the teacher in a shared representation space. Additionally, we introduce an expanded set of benchmarks on large-scale real-world datasets where the performance gap between teacher and student GNNs is non-negligible. Experiments across four datasets and 14 heterogeneous GNN architectures show that G-CRD consistently boosts the performance and robustness of lightweight GNNs, outperforming LSP (and a global structure preserving (GSP) variant of LSP) as well as baselines from 2-D computer vision. An analysis of the representational similarity among teacher and student embedding spaces reveals that G-CRD balances preserving local and global relationships, while structure preserving approaches are best at preserving one or the other.

Full Text