Imbalanced node classification with Graph Neural Networks: A unified approach leveraging homophily and label information

Dingyang Lv,Zhengjia Xu,Jinghui Zhang,Yuchen Wang,Fang Dong

doi:10.1016/j.asoc.2023.110985

Dingyang Lv, Zhengjia Xu + Show 3 more

https://doi.org/10.1016/j.asoc.2023.110985

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

The homophily assumption in graph theory posits that nodes with similar characteristics have a higher tendency to form connections. This principle has rendered Graph Neural Networks (GNNs) as vital tools for graph representation learning. However, many real-world graphs may exhibit a phenomenon often termed as neighbor class imbalance, which is characterized by frequent connections between dissimilar nodes, a scenario reflecting low homophily. Classical GNNs tend to overlook this issue, leading to a significant decline in performance. Prior research has attempted to address this challenge by employing high-order neighborhoods and filtering out dissimilar neighbors, yet they have paid little attention to homophily degree estimation and label utilization. In this work, we initially explore the performance of classical GNNs on a synthetic graph with varying homophily degrees, designated as SynG-N. Following this, we introduce a novel method, HLA-GNN, which integrates homophily degree estimation and label utilization to enhance classical GNNs. The degrees of homophily between node pairs are estimated using a limited set of ground-truth labels, which can be integrated into classic GNNs to guide the message aggregation process. Drawing on the label propagation algorithm, we combine the partially observed class labels to enhance the original feature space. Here, the observed class labels are randomly masked as a feature augmentation and training signal. Our experimental results on eight datasets with varying degrees of homophily underscore the effectiveness of our method. HLA-GNN achieves a 12.69%∼34.19% improvement on low-homophily graphs, while maintaining competitive results in homophilous settings.

Full Text