Abstract
We perform theoretical and algorithmic studies for the problem of clustering and semi-supervised classification on graphs with both pairwise relational information and single-point feature information, upon a joint stochastic block model for generating synthetic graphs with both edges and node features. Asymptotically exact analysis based on the Bayesian inference of the underlying model are conducted, using the cavity method in statistical physics. Theoretically, we identify a phase transition of the generative model, which puts fundamental limits on the ability of all possible algorithms in the clustering task of the underlying model. Algorithmically, we propose a belief propagation algorithm that is asymptotically optimal on the generative model, and can be further extended to a belief propagation graph convolution neural network (BPGCN) for semi-supervised classification on graphs. For the first time, well-controlled benchmark datasets with asymptotially exact properties and optimal solutions could be produced for the evaluation of graph convolution neural networks, and for the theoretical understanding of their strengths and weaknesses. In particular, on these synthetic benchmark networks we observe that existing graph convolution neural networks are subject to an sparsity issue and an ovefitting issue in practice, both of which are successfully overcome by our BPGCN. Moreover, when combined with classic neural network methods, BPGCN yields extraordinary classification performances on some real-world datasets that have never been achieved before.
Highlights
Learning on graphs is an important task in machine learning and the broader data sciences which triggers a lot of successful applications in various fields, including social sciences, biology, and computer science
Utilizing the cavity method in statistical physics and the corresponding belief propagation (BP) algorithm which is asymptotically exact in the thermodynamic limit, theoretical results on the detectability phase transition point and the phase diagram for joint stochastic block model (JSBM) are uncovered
Based on the BP equations established on JSBM, we proposed an algorithm for semisupervised classification, adopting the graph convolution network structure, which we termed as belief propagation graph convolution neural network (BPGCN)
Summary
Learning on graphs is an important task in machine learning and the broader data sciences which triggers a lot of successful applications in various fields, including social sciences (e.g., social network analysis), biology (e.g., protein structure prediction and molecular fingerprints learning), and computer science (e.g., knowledge graph analysis). If the group information is known on a small subset of nodes, practically, these nodes could serves as a training set, and the learning task is to determine the group membership of the remaining nodes through exploring the direct group information via their attributes, as well as the indirect information via their relationships with the training nodes (edge connectivities of the graph) This learning task is semisupervised classification on graphs, a problem that recently has drawn much attention in both networks sciences and machine learning communities; for this problem, we witnessed the burst of graph convolution neural networks (GCN), which is a powerful neural network architecture that yields ground-breaking performances [1]. The unknown generative parameters of the JSBM graph can be learned in a standard classification approach, through the forward-passing of (truncated) BP equations together with the backward-passing of the gradients of the loss function This GCN algorithm, which we term BPGCN, guarantees to yield Bayes optimal classification results [15] on synthetic graphs generated by the JSBM and performs comparably with stateor-the-art GCNs on real-world networks.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.