Compressing deep graph convolution network with multi-staged knowledge distillation.

Junghun Kim,Jinhong Jung,U Kang

doi:10.1371/journal.pone.0256187

Junghun Kim, Jinhong Jung + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0256187

Copy DOI

Journal: PloS one	Publication Date: Aug 13, 2021
Citations: 7	License type: CC BY 4.0

Affiliation: Seoul National University, Jeonbuk National University

Abstract

Given a trained deep graph convolution network (GCN), how can we effectively compress it into a compact network without significant loss of accuracy? Compressing a trained deep GCN into a compact GCN is of great importance for implementing the model to environments such as mobile or embedded systems, which have limited computing resources. However, previous works for compressing deep GCNs do not consider the multi-hop aggregation of the deep GCNs, though it is the main purpose for their multiple GCN layers. In this work, we propose MustaD (Multi-staged knowledge Distillation), a novel approach for compressing deep GCNs to single-layered GCNs through multi-staged knowledge distillation (KD). MustaD distills the knowledge of 1) the aggregation from multiple GCN layers as well as 2) task prediction while preserving the multi-hop feature aggregation of deep GCNs by a single effective layer. Extensive experiments on four real-world datasets show that MustaD provides the state-of-the-art performance compared to other KD based methods. Specifically, MustaD presents up to 4.21%p improvement of accuracy compared to the second-best KD models.

Highlights

Given a trained deep graph convolution network, how can we compress it into a compact network without a significant drop in accuracy? Graph Convolution Network (GCN) [1] learns latent node representations in graph data, and plays a crucial role as a feature extractor when a model is jointly trained to learn node features and perform a specific task
We show that the student distilled by our proposed MUSTAD simulates the K-order polynomial filter with inter-dependent coefficients using only a linear transformation layer and a single effective layer, has a similar expressiveness to the K-layer GCN
We investigate the effect of the single effective layer by comparing the proposed MUSTAD to a student with a single naive GCN layer

Summary

Introduction

Given a trained deep graph convolution network, how can we compress it into a compact network without a significant drop in accuracy? Graph Convolution Network (GCN) [1] learns latent node representations in graph data, and plays a crucial role as a feature extractor when a model is jointly trained to learn node features and perform a specific task. Knowledge Distillation (KD) has been popular due to its simplicity based on a studentteacher model; KD distills the knowledge from a large teacher model into a smaller student model so that the student performs as well as the teacher [18,19,20] In this context, Yang et al [21] have recently proposed a KD method called LSP (Local Structure Preserving) for compressing GCN models. LSP does not consider the teacher’s knowledge on multi-hop feature aggregation the process is essentially involved in a deep-layered GCN; its performance on preserving accuracy is limited, especially for compressing a deep GCN. We propose MUSTAD, a novel approach for compressing deep-layered GCNs through distilling the knowledge of both the feature aggregation and the feature representation. The code and the datasets are available at https://github.com/snudatalab/MustaD

Related work

Experiments

Experimental setup

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Compressing deep graph convolution network with multi-staged knowledge distillation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Compressing deep graph convolution network with multi-staged knowledge distillation
Jinhong Jung ... Junghun Kim
-
Jinhong Jung, et. al.Jinhong Jung ... Junghun Kim
13 Aug 2021
13 Aug 2021

Graph over-parameterization: Why the graph helps the training of deep graph convolutional network
Yucong Lin ... Junwei Lu
Neurocomputing | VOL. 534
Yucong Lin, et. al.Yucong Lin ... Junwei Lu
03 Mar 2023
Neurocomputing | VOL. 534

Multi-species Protein Association Prediction Using Residual Graph Convolutional Network
Rangan Das ... Bikram Boote
-
Rangan Das, et. al.Rangan Das ... Bikram Boote
30 Oct 2020
30 Oct 2020

A deep graph convolutional neural network architecture for graph classification.
Yuchen Zhou ... Fanliang Bu
PloS one | VOL. 18
Yuchen Zhou, et. al.Yuchen Zhou ... Fanliang Bu
10 Mar 2023
PloS one | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Compressing deep graph convolution network with multi-staged knowledge distillation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one