AbstractLink prediction, which aims to infer missing edges or predict future edges based on currently observed graph connections, has emerged as a powerful technique for diverse applications such as recommendation, relation completion, etc. While there is rich literature on link prediction based on node representation learning, direct link embedding is relatively less studied and less understood. One common practice in previous work characterizes a link by manipulate the embeddings of its incident node pairs, which is not capable of capturing effective link features. Moreover, common link prediction methods such as random walks and graph auto-encoder usually rely on full-graph training, suffering from poor scalability and high resource consumption on large-scale graphs. In this paper, we propose Inductive Subgraph Embedding for Link Prediciton (SE4LP) — an end-to-end scalable representation learning framework for link prediction, which utilizes the strong correlation between central links and their neighborhood subgraphs to characterize links. We sample the “link-centric induced subgraphs” as input, with a subgraph-level contrastive discrimination as pretext task, to learn the intrinsic and structural link features via subgraph classification. Extensive experiments on five datasets demonstrate that SE4LP has significant superiority in link prediction in terms of performance and scalability, when compared with state-of-the-art methods. Moreover, further analysis demonstrate that introducing self-supervision in link prediction can significantly reduce the dependence on training data and improve the generalization and scalability of model.
Read full abstract