Self-supervised learning for heterogeneous graph via structure information based on metapath

Shuai Ma,Jian-wei Liu,Xin Zuo

doi:10.1016/j.asoc.2023.110388

Abstract

Graph neural networks (GNNs) are the dominant paradigm for modeling and handling graph structural data by learning universal node representation. The traditional way of training GNNs depends on a great many labeled data, which is time-consuming and money-consuming. In some special scenes, it is even unavailable and impracticable. Self-supervised representation learning, which can generate labels by graph structural data itself, is a potential approach to tackle this problem. And turning to research self-supervised learning problems for heterogeneous graphs is more challenging than dealing with homogeneous graphs, there are fewer studies about it as well. In this paper, we propose a SElf-supervised learning method for heterogeneous graph via Structure Information based on Metapath (SESIM). Firstly, the pseudo-labels are constructed to train pretext tasks, using data itself and avoiding time-consuming manual labeling. Afterward, we use traditional graph neural networks to aggregate node features, obtaining the node embeddings. And then, the primary task and pretext tasks are designed by these node embeddings. The pretext tasks, i.e., jump numbers prediction between nodes in each metapath, can improve the representation ability of the primary task. Moreover, predicting jump numbers in each metapath can effectively utilize graph structural information, which is the essential property of nodes. Therefore, SESIM deepens the understanding of models for graph structure. At last, we train the primary task and pretext tasks jointly and balance the contributions of pretext tasks for the primary task. The key advantage of our proposed model is that we research self-supervised learning for the heterogeneous graph to address the time-consuming and money-consuming problem of obtaining labels. And we design a novel pretext task, i.e., jump numbers prediction in each metapath, via graph structural information based on the metapath. Empirical results validate the performance of the SESIM method and demonstrate that this method can improve the representation ability of traditional neural networks on link prediction tasks and node classification tasks.

Full Text