NodeAug: Semi-Supervised Node Classification with Data Augmentation

Yiwei Wang,Juncheng Liu,Yuxuan Liang,Yujun Cai,Wei Wang,Bryan Hooi

doi:10.1145/3394486.3403063

Abstract

By using Data Augmentation (DA), we present a new method to enhance Graph Convolutional Networks (GCNs), that are the state-of-the-art models for semi-supervised node classification. DA for graph data remains under-explored. Due to the connections built by edges, DA for different nodes influence each other and lead to undesired results, such as uncontrollable DA magnitudes and changes of ground-truth labels. To address this issue, we present the NodeAug (Node-Parallel Augmentation) scheme, that creates a 'parallel universe' for each node to conduct DA, to block the undesired effects from other nodes. NodeAug regularizes the model prediction of every node (including unlabeled) to be invariant with respect to changes induced by Data Augmentation (DA), so as to improve the effectiveness. To augment the input features from different aspects, we propose three DA strategies by modifying both node attributes and the graph structure. In addition, we introduce the subgraph mini-batch training for the efficient implementation of NodeAug. The approach takes the subgraph corresponding to the receptive fields of a batch of nodes as the input per iteration, rather than the whole graph that the prior full-batch training takes. Empirically, NodeAug yields significant gains for strong GCN models on the Cora, Citeseer, Pubmed, and two co-authorship networks, with a more efficient training process thanks to the proposed subgraph mini-batch training approach.

Full Text