Abstract
The homophily assumption in graph theory posits that nodes with similar characteristics have a higher tendency to form connections. This principle has rendered Graph Neural Networks (GNNs) as vital tools for graph representation learning. However, many real-world graphs may exhibit a phenomenon often termed as neighbor class imbalance, which is characterized by frequent connections between dissimilar nodes, a scenario reflecting low homophily. Classical GNNs tend to overlook this issue, leading to a significant decline in performance. Prior research has attempted to address this challenge by employing high-order neighborhoods and filtering out dissimilar neighbors, yet they have paid little attention to homophily degree estimation and label utilization. In this work, we initially explore the performance of classical GNNs on a synthetic graph with varying homophily degrees, designated as SynG-N. Following this, we introduce a novel method, HLA-GNN, which integrates homophily degree estimation and label utilization to enhance classical GNNs. The degrees of homophily between node pairs are estimated using a limited set of ground-truth labels, which can be integrated into classic GNNs to guide the message aggregation process. Drawing on the label propagation algorithm, we combine the partially observed class labels to enhance the original feature space. Here, the observed class labels are randomly masked as a feature augmentation and training signal. Our experimental results on eight datasets with varying degrees of homophily underscore the effectiveness of our method. HLA-GNN achieves a 12.69%∼34.19% improvement on low-homophily graphs, while maintaining competitive results in homophilous settings.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have