Abstract

Attributed graph clustering, which learns node representation from node attribute and topological graph for clustering, is a fundamental and challenging task for multimedia network-structured data analysis. Recently, graph contrastive learning (GCL)-based methods have obtained impressive clustering performance on this task. Nevertheless, there still remain some limitations to be solved: 1) most existing methods fail to consider the self-consistency between latent representations and cluster structures; and 2) most methods require a post-processing operation to get clustering labels. Such a two-step learning scheme results in models that cannot handle newly generated data, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.</i> , out-of-sample (OOS) nodes. To address these issues in a unified framework, a <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">S</b> elf-consistent <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">C</b> ontrastive <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">A</b> ttributed <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">G</b> raph <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">C</b> lustering (SCAGC) network with pseudo-label prompt is proposed in this article. In SCAGC, by clustering labels prompt information, a self-consistent contrastive loss, which aims to maximize the consistencies of intra-cluster representations while minimizing the consistencies of inter-cluster representations, is designed for representation learning. Meanwhile, a clustering module is built to directly output clustering labels by contrasting the representation of different clusters. Thus, for the OOS nodes, SCAGC can directly calculate their clustering labels. Extensive experimental results on seven benchmark datasets have shown that SCAGC consistently outperforms 16 competitive clustering methods. The source code could be accessed at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/xdweixia/SCAGC</uri> .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call