The interplay between communities and homophily in semi-supervised classification using graph neural networks

Roman Kern,Tomislav Duricic,Hussain Hussain,Elisabeth Lex,Denis Helic

doi:10.1007/s41109-021-00423-1

Roman Kern, Tomislav Duricic + Show 3 more

Open Access

https://doi.org/10.1007/s41109-021-00423-1

Copy DOI

Journal: Applied Network Science	Publication Date: Oct 26, 2021
Citations: 1	License type: open-access

Affiliation: Graz University of Technology, Know Center

Abstract

Graph Neural Networks (GNNs) are effective in many applications. Still, there is a limited understanding of the effect of common graph structures on the learning process of GNNs. To fill this gap, we study the impact of community structure and homophily on the performance of GNNs in semi-supervised node classification on graphs. Our methodology consists of systematically manipulating the structure of eight datasets, and measuring the performance of GNNs on the original graphs and the change in performance in the presence and the absence of community structure and/or homophily. Our results show the major impact of both homophily and communities on the classification accuracy of GNNs, and provide insights on their interplay. In particular, by analyzing community structure and its correlation with node labels, we are able to make informed predictions on the suitability of GNNs for classification on a given graph. Using an information-theoretic metric for community-label correlation, we devise a guideline for model selection based on graph structure. With our work, we provide insights on the abilities of GNNs and the impact of common network phenomena on their performance. Our work improves model selection for node classification in semi-supervised settings.

Highlights

Graphs are ubiquitous forms of data, which are encountered in many domains such as social networks, the web, citation networks, molecule interaction networks, and knowledge bases
Contributions Our contributions are two-fold: (1) we provide an extensive study on the interplay between community structure and homophily and their impact on classification with Graph neural network (GNN), and (2) we propose to quantify the correlation between communities and labels using the uncertainty coefficient (Press et al 2007), and show that this measure predicts the suitability of GNNs for graph data
GNN models In our experiments, we study six GNN architectures that are widely used for semisupervised classification on graphs (a) Graph Convolutional Networks (GCN) (Kipf and Welling 2017), (b) Graph Sample and Aggregate (SAGE) (Hamilton et al 2017), (c) Graph Attention Networks (GAT) (Veličković et al 2018), (d) Simple Graph Convolutions (SGC) (Wu et al 2019), (e) Approximate Personalized Propagation of Neural Predictions (APPNP) (Klicpera et al 2019), and (f ) Cluster Graph Neural Networks (CGCN) (Chiang et al 2019)

Summary

Introduction

Graphs are ubiquitous forms of data, which are encountered in many domains such as social networks, the web, citation networks, molecule interaction networks, and knowledge bases. Machine learning on graphs has been an essential area in research. The developments in this area have resulted in solving or elevating the state of the art of many graph-related tasks, such as node classification (Kipf and Welling 2017), link prediction (Zhang and Chen 2018), and graph classification (Hamilton et al 2017). The state-of-the-art models for solving this task have predominantly been graph neural networks (GNNs) (Wu et al 2019). Communities have an impact on information propagation in graphs. Community structure can form barriers for information propagation in graphs, which forms the basis of processing with GNNs (Hasani-Mavriqi et al 2018). Neither community structure nor its relationship with homophily is well-studied in the context of GNNs

Methods

Results

Conclusion