Information aggregation and propagation over networks via graph neural networks (GNNs) plays an important role in node or graph representation learning, which currently depend on the calculation with a fixed adjacency matrix, facing over-smoothing problem, and difficulty to stack multiple layers for high-level representations. In contrast, Transformer calculates an importance score for each node to learn its embedding via the attention mechanism and has achieved great successes in many natural language processing (NLP) and computer vision (CV) tasks. However, Transformer is inflexible to extend to graphs, as its input and output must have the same dimension. It will also become intractable to allocate attention over a large-scale graph due to distractions. Moreover, most graph Transformers are trained in supervised ways, which consume additional resources to annotate samples with potentially wrong labels and have limited generalization of representations. Therefore, this article attempts to build a new Sparse Graph Transformer with Contrastive learning for graph representation learning, called SGTC. Specifically, we first employ centrality measures to remove the redundant topological information from input graph according to the influences of nodes and edges, then disturb the pruned graph to get two different augmentation views, and learn node representations in a contrastive manner. Besides, a novel sparse attention mechanism is also proposed to capture structural features of graphs, which effectively save memory and training time. SGTC can produce low-dimensional and high-order node representations, which have better generalization for multiple tasks. The proposed model is evaluated on three downstream tasks over six networks, and experimental results confirm its superior performance against the state-of-the-art baselines.
Read full abstract