Abstract

Statistical relational learning (SRL) and graph neural networks (GNNs) are two powerful approaches for learning and inference over graphs. Typically, they are evaluated in terms of simple metrics such as accuracy over individual node labels. Complex aggregate graph queries (AGQ) involving multiple nodes, edges, and labels are common in the graph mining community and are used to estimate important network properties such as social cohesion and influence. While graph mining algorithms support AGQs, they typically do not take into account uncertainty, or when they do, make simplifying assumptions and do not build full probabilistic models. In this paper, we examine the performance of SRL and GNNs on AGQs over graphs with partially observed node labels. We show that, not surprisingly, inferring the unobserved node labels as a first step and then evaluating the queries on the fully observed graph can lead to sub-optimal estimates, and that a better approach is to compute these queries as an expectation under the joint distribution. We propose a sampling framework to tractably compute the expected values of AGQs. Motivated by the analysis of subgroup cohesion in social networks, we propose a suite of AGQs that estimate the community structure in graphs. In our empirical evaluation, we show that by estimating these queries as an expectation, SRL-based approaches yield up to a 50-fold reduction in average error when compared to existing GNN-based approaches.

Highlights

  • Large realworld graphs in domains such as social media, computational biology, and IoT often have missing information that needs to be inferred

  • Statistical relational learning (SRL) approaches such as probabilistic soft logic (PSL) and Markov logic networks (MLN), and probabilistic graph neural networks (GNNs) approaches such Graph Markov neural networks (GMNNs) model the joint distribution over all unobserved node labels and impute node labels using the mode or the mean of the joint distribution

  • We motivate the practical need for aggregate graph queries (AGQs), and show that existing approaches which optimize for locally decomposable metrics such as accuracy neither perform well theoretically nor empirically

Read more

Summary

Introduction

Large realworld graphs in domains such as social media (e.g., friendship and follower graphs), computational biology (e.g., protein interaction networks), and IoT (e.g., sensor networks) often have missing information that needs to be inferred. Global graph properties can be computed using complex graph queries While many such graph properties have been proposed (Scott, 1988; Wasserman and Faust, 1994; Cook & Holder, 2006; Rajaraman & Ullman, 2011), along with efficient algorithms to estimate them (Shi et al, 2015; Liu et al, 2018; Wu et al, 2014; Qiang et al, 2014; Dunne & Shneiderman, 2013), the task of estimating these queries when there is missing information, such as node labels, has not received much attention.

Statistical relational learning
Markov logic networks
Probabilistic soft logic
Graph neural networks
Graph convolutional networks
Graph attention networks
Graph Markov neural networks
Problem definition
Aggregate graph queries
Point estimation approach
Expectation‐based approach
Analysis of the estimation approaches
Expectation‐based approach for PSL
Empirical evaluation
Experimental setup and datasets
Performance on AGQs
Methods
Trade‐off between estimating AGQs and locally decomposable metrics
Conclusion and future work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call