Graph self-supervised learning is an effective technique for learning common knowledge from unlabeled graph data through pretext tasks. To capture the interrelationships between nodes and their essential roles globally, existing methods use clustering labels as self-supervised signals. However, in some cases, these methods may introduce noise, leading to over-fitting of the model and a reduction in performance. To address these issues, a novel framework for Graph Self-Supervised Curriculum Learning based on clustering label smoothing called GSSCL has been proposed. GSSCL clusters knowledge in an easy-to-difficult manner, reducing the heavy dependence on the reliability of clustering and improving the generalizability of the model. Moreover, the Silhouette Coefficient is employed to evaluate the clustering confident scores for all nodes. Some nodes are selected based on high confident scores to perform self-supervised learning. To account for the possibility of complex heterophilous information in graphs (e.g., noisy links), clustering pseudo-label smoothing is performed on K-nearest neighbor graphs built upon the similarities between node features instead of the original graph structures. The obtained multi-scale knowledge is then applied to curriculum learning. Finally, comprehensive experiments conducted across diverse public graph benchmarks demonstrate the superior performance of the proposed framework. It exhibits comparable results to state-of-the-art methods across semi-supervised node classification and clustering tasks.
Read full abstract