A Graphlet-based Explanation Generator for Graph Neural Networks Over Biological Datasets

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Background: Graph neural networks’ (GNNs) explainability, especially the explanation of edges and interactions among vertices in GNNs, is demanding mainly owing to dynamics and groupings between vertices. The existing graph explainability methods ignore the analysis of the following tasks weights over subgraphs but instead analyze solely sample-level explainability. Such sample-level explainability decreases their generalizability since it directly searches the explaining behaviour in the input dataset. Objective: In this study, we come up with a novel Orbit-based GNN explainer (OExplainer), which integrates both sample-level and method-level approaches over a predetermined set of subgraphs. As part of such analysis of subgraphs, our goal is to interpret graphs more comprehensively and intelligibly while providing each vertex’s explainability score for a particular graph instance. Methods: Our OExplainer decomposes the following graph neural network weights into explaining subgraph bases while identifying and characterizing particular predictions. By such characterization, we can carefully and accurately interpret the predetermined graph orbit’s role in vertex representation determination. In this characterization, we can also clarify the method’s behaviour generally for the whole input dataset. Moreover, we come up with novel vertex-specific scores in our subgraphbased approach over nonisomorphic graphlets. Such vertex-specific score encourages sample-level vertex improvement, and such improvement is related to the graph neural network’s vertex classification task. Results: Our experiments over simulated datasets confirm the importance and criticality of method weights in vertex classification explanation. In this case, method weight decomposition also has criticality. Our detailed experiments over multiple real protein-protein interaction datasets and metabolic interaction networks also exhibit enhanced performance in vertex classification. Conclusion: In both simulated and biological protein-protein interaction datasets, our approach outperforms the competing explanation approaches

Similar Papers
  • Conference Article
  • Cite Count Icon 1
  • 10.1109/dsaa54385.2022.10032447
Graph Summarization as Vertex Classification Task using Graph Neural Networks vs. Bloom Filter
  • Oct 13, 2022
  • M Blasi + 4 more

The goal of graph summarization is to represent large graphs in a structured and compact way. A graph summary based on equivalence classes preserves predefined features of each vertex within a k-hop neighborhood, such as the vertex and edge labels. Based on these neighborhood characteristics, the vertex is assigned to an equivalence class. The calculation of the assigned equivalence class must be a permutation invariant operation on the predefined features. This is typically achieved by sorting on the feature values, which is computationally expensive, and subsequently hashing the result. Graph Neural Networks (GNNs) fulfill the permutation invariance requirement. We formulate the problem of graph summarization as a subgraph classification task on the root vertex of the k-hop neighborhood. We adapt different GNN architectures, both based on the popular message-passing protocol and alternative approaches, to perform the structural graph summarization task. We compare different GNNs with a standard multi-layer perceptron (MLP) and Bloom filter as a non-neural method. We consider four popular graph summary models on a large web graph. This resembles challenging multi-class vertex classification tasks with the numbers of classes ranging from 576 to hundreds of thousands. Our results show that the performance of GNNs are close to each other. In three out of four experiments, the non-message-passing Graph-MLP model outperforms the other GNNs. The performance of the standard MLP is extraordinarily good, especially in the presence of many classes. Finally, the Bloom filter outperforms all neural architectures by a large margin, except for the dataset with the fewest number (576) of classes. This is an interesting result, since it sheds light on how well and in which contexts GNNs are suited for graph summarization. Furthermore, it demonstrates the need for considering strong non-neural baselines for standard GNN tasks such as vertex classification.

  • Conference Article
  • Cite Count Icon 99
  • 10.24963/ijcai.2021/353
UniGNN: a Unified Framework for Graph and Hypergraph Neural Networks
  • Aug 1, 2021
  • Jing Huang + 1 more

Hypergraph, an expressive structure with flexibility to model the higher-order correlations among entities, has recently attracted increasing attention from various research domains. Despite the success of Graph Neural Networks (GNNs) for graph representation learning, how to adapt the powerful GNN-variants directly into hypergraphs remains a challenging problem. In this paper, we propose UniGNN, a unified framework for interpreting the message passing process in graph and hypergraph neural networks, which can generalize general GNN models into hypergraphs. In this framework, meticulously-designed architectures aiming to deepen GNNs can also be incorporated into hypergraphs with the least effort. Extensive experiments have been conducted to demonstrate the effectiveness of UniGNN on multiple real-world datasets, which outperform the state-of-the-art approaches with a large margin. Especially for the DBLP dataset, we increase the accuracy from 77.4% to 88.8% in the semi-supervised hypernode classification task. We further prove that the proposed message-passing based UniGNN models are at most as powerful as the 1-dimensional Generalized Weisfeiler-Leman (1-GWL) algorithm in terms of distinguishing non-isomorphic hypergraphs. Our code is available at https://github.com/OneForward/UniGNN.

  • Research Article
  • 10.1609/aaai.v38i11.29082
Improving GNN Calibration with Discriminative Ability: Insights and Strategies
  • Mar 24, 2024
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Yujie Fang + 3 more

The widespread adoption of Graph Neural Networks (GNNs) has led to an increasing focus on their reliability. To address the issue of underconfidence in GNNs, various calibration methods have been developed to gain notable reductions in calibration error. However, we observe that existing approaches generally fail to enhance consistently, and in some cases even deteriorate, GNNs' ability to discriminate between correct and incorrect predictions. In this study, we advocate the significance of discriminative ability and the inclusion of relevant evaluation metrics. Our rationale is twofold: 1) Overlooking discriminative ability can inadvertently compromise the overall quality of the model; 2) Leveraging discriminative ability can significantly inform and improve calibration outcomes. Therefore, we thoroughly explore the reasons why existing calibration methods have ineffectiveness and even degradation regarding the discriminative ability of GNNs. Building upon these insights, we conduct GNN calibration experiments across multiple datasets using a straightforward example model, denoted as DC(GNN). Its excellent performance confirms the potential of integrating discriminative ability as a key consideration in the calibration of GNNs, thereby establishing a pathway toward more effective and reliable network calibration.

  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.eswa.2021.114655
Node classification using kernel propagation in graph neural networks
  • Feb 4, 2021
  • Expert Systems with Applications
  • Sakthi Kumar Arul Prakash + 1 more

Node classification using kernel propagation in graph neural networks

  • Conference Article
  • Cite Count Icon 38
  • 10.1145/3511808.3557356
Imbalanced Graph Classification via Graph-of-Graph Neural Networks
  • Oct 17, 2022
  • Yu Wang + 3 more

Graph Neural Networks (GNNs) have achieved unprecedented success in learning graph representations to identify categorical labels of graphs. However, most existing graph classification problems with GNNs follow a balanced data splitting protocol, which is misaligned with many real-world scenarios in which some classes have much fewer labels than others. Directly training GNNs under this imbalanced situation may lead to uninformative representations of graphs in minority classes, and compromise the overall performance of downstream classification, which signifies the importance of developing effective GNNs for handling imbalanced graph classification. Existing methods are either tailored for non-graph structured data or designed specifically for imbalance node classification while few focus on imbalance graph classification. To this end, we introduce a novel framework, Graph-of-Graph Neural Networks (G$^2$GNN), which alleviates the graph imbalance issue by deriving extra supervision globally from neighboring graphs and locally from graphs themselves. Globally, we construct a graph of graphs (GoG) based on kernel similarity and perform GoG propagation to aggregate neighboring graph representations, which are initially obtained by node-level propagation with pooling via a GNN encoder. Locally, we employ topological augmentation via masking nodes or dropping edges to improve the model generalizability in discerning topology of unseen testing graphs. Extensive graph classification experiments conducted on seven benchmark datasets demonstrate our proposed G$^2$GNN outperforms numerous baselines by roughly 5\% in both F1-macro and F1-micro scores. The implementation of G$^2$GNN is available at \href{https://github.com/YuWVandy/G2GNN}{https://github.com/YuWVandy/G2GNN}.

  • Book Chapter
  • Cite Count Icon 14
  • 10.1137/1.9781611977653.ch18
RELIANT: Fair Knowledge Distillation for Graph Neural Networks
  • Jan 1, 2023
  • Yushun Dong + 5 more

Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility. Open-source code can be found at https://github.com/yushundong/RELIANT.KeywordsGraph Neural NetworksAlgorithmic FairnessKnowledge Distillation

  • Research Article
  • 10.1016/j.mrgentox.2025.503858
AI/ML modeling to enhance the capability of in vitro and in vivo tests in predicting human carcinogenicity.
  • Apr 1, 2025
  • Mutation research. Genetic toxicology and environmental mutagenesis
  • Ani Tevosyan + 9 more

AI/ML modeling to enhance the capability of in vitro and in vivo tests in predicting human carcinogenicity.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/ijcnn52387.2021.9533355
Impute Gene Expression Missing Values via Biological Networks: Optimal Fusion of Data and Knowledge
  • Jul 18, 2021
  • Mingrong Xiang + 4 more

Gene expression data often contain missing values that, if not handled properly, may mislead or invalidate the downstream analyses. With the emergence of graph neural networks (GNN), domain knowledge about gene regulation can be leveraged to guide the missing data imputation. We show in this paper, however, that naive application of GNN on the raw gene-expression data can actually lead to worse imputation. We analyse this problem considering both the intrinsic property of GNN message passing and potential data-knowledge inconsistency. We propose two measures towards optimal integration of biological networks in the gene-expression missing data imputation. These include expression data normalisation and a weighting scheme for GNN message passing. Experiments on two different biological networks and gene expression datasets show that our method outperforms state-of-the-art generic imputation algorithms and alternative GNN models, obtaining lower mean absolute error (MAE) consistently.

  • Book Chapter
  • Cite Count Icon 7
  • 10.1007/978-3-031-11154-9_11
BioGNN: How Graph Neural Networks Can Solve Biological Problems
  • Sep 27, 2022
  • Pietro Bongini + 3 more

Graph Neural Networks (GNNs) have known an important and fast development in the last decade, with many theoretical and practical innovations. Their main feature is the capability of processing graph structured data with minimal loss of structural information. This makes GNNs the ideal family of models for processing a wide variety of biological data: metabolic networks, structural formulas of molecules, and proteins are all examples of biological data that are naturally represented as graphs. As an example, GNNs were employed, with very good results, for the prediction of protein-protein interactions. This was achieved by applying a clique detection model on graphs representing the interaction of the secondary structures of pairs of proteins. The introduction of composite GNN models, designed for processing heterogeneous graphs, has allowed researchers to study even more complex networks. For instance, drug side-effects were predicted based on a graph describing the interactions between drugs and human genes. Another very important innovation was brought by generative models, that were introduced for graph data after the success of generative models for images. In particular, GNNs were used to build a sequential model for the generation of potential drug candidates, in the form of molecular graphs, with the purpose of enhancing existing drug discovery techniques. The increasing accuracy and efficacy of these models, as well as the development of more complex biological databases, ensure even more interesting future developments in the application of GNNs to biological data.

  • Research Article
  • Cite Count Icon 3
  • 10.1109/tpami.2024.3379251
PAGE: Prototype-Based Model-Level Explanations for Graph Neural Networks.
  • Oct 1, 2024
  • IEEE transactions on pattern analysis and machine intelligence
  • Yong-Min Shin + 2 more

Aside from graph neural networks (GNNs) attracting significant attention as a powerful framework revolutionizing graph representation learning, there has been an increasing demand for explaining GNN models. Although various explanation methods for GNNs have been developed, most studies have focused on instance-level explanations, which produce explanations tailored to a given graph instance. In our study, we propose Prototype-bAsed GNN-Explainer ([Formula: see text]), a novel model-level GNN explanation method that explains what the underlying GNN model has learned for graph classification by discovering human-interpretable prototype graphs. Our method produces explanations for a given class, thus being capable of offering more concise and comprehensive explanations than those of instance-level explanations. First, [Formula: see text] selects embeddings of class-discriminative input graphs on the graph-level embedding space after clustering them. Then, [Formula: see text] discovers a common subgraph pattern by iteratively searching for high matching node tuples using node-level embeddings via a prototype scoring function, thereby yielding a prototype graph as our explanation. Using six graph classification datasets, we demonstrate that [Formula: see text] qualitatively and quantitatively outperforms the state-of-the-art model-level explanation method. We also carry out systematic experimental studies by demonstrating the relationship between [Formula: see text] and instance-level explanation methods, the robustness of [Formula: see text] to input data scarce environments, and the computational efficiency of the proposed prototype scoring function in [Formula: see text].

  • Conference Article
  • Cite Count Icon 38
  • 10.1109/sc41405.2020.00074
Reducing Communication in Graph Neural Network Training
  • Nov 1, 2020
  • Alok Tripathy + 2 more

Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the naturally sparse connectivity information of the data. GNNs represent this connectivity as sparse matrices, which have lower arithmetic intensity and thus higher communication costs compared to dense matrices, making GNNs harder to scale to high concurrencies than convolutional or fully-connected neural networks. We introduce a family of parallel algorithms for training GNNs and show that they can asymptotically reduce communication compared to previous parallel GNN training methods. We implement these algorithms, which are based on 1D, 1. 5D, 2D, and 3D sparse-dense matrix multiplication, using torch.distributed on GPU-equipped clusters. Our algorithms optimize communication across the full GNN training pipeline. We train GNNs on over a hundred GPUs on multiple datasets, including a protein network with over a billion edges.

  • Research Article
  • 10.1145/3638057
Correlation-aware Graph Data Augmentation with Implicit and Explicit Neighbors
  • Feb 27, 2024
  • ACM Transactions on Knowledge Discovery from Data
  • Chuan-Wei Kuo + 4 more

In recent years, there has been a significant surge in commercial demand for citation graph-based tasks, such as patent analysis, social network analysis, and recommendation systems. Graph Neural Networks (GNNs) are widely used for these tasks due to their remarkable performance in capturing topological graph information. However, GNNs’ output results are highly dependent on the composition of local neighbors within the topological structure. To address this issue, we identify two types of neighbors in a citation graph: explicit neighbors based on the topological structure and implicit neighbors based on node features. Our primary motivation is to clearly define and visualize these neighbors, emphasizing their importance in enhancing graph neural network performance. We propose a Correlation-aware Network (CNet) to re-organize the citation graph and learn more valuable informative representations by leveraging these implicit and explicit neighbors. Our approach aims to improve graph data augmentation and classification performance, with the majority of our focus on stating the importance of using these neighbors, while also introducing a new graph data augmentation method. We compare CNet with state-of-the-art (SOTA) GNNs and other graph data augmentation approaches acting on GNNs. Extensive experiments demonstrate that CNet effectively extracts more valuable informative representations from the citation graph, significantly outperforming baselines. The code is available on public GitHub. 1

  • Research Article
  • 10.3390/app14209333
Dynamic Link Prediction in Jujube Sales Market: Innovative Application of Heterogeneous Graph Neural Networks
  • Oct 13, 2024
  • Applied Sciences
  • Yichang Wu + 4 more

Link prediction is crucial in forecasting potential distribution channels within the dynamic and heterogeneous Xinjiang jujube sales market. This study utilizes knowledge graphs to represent entities and constructs a complex network model for market analysis. Graph neural networks (GNNs) have shown excellent performance in handling graph-structured data, but they do not necessarily significantly outperform in link prediction tasks due to an overreliance on node features and a neglect of structural information. Additionally, the Xinjiang jujube dataset exhibits unique complexity, including multiple types, attributes, and relationships, distinguishing it from typical GNN datasets such as DBLP and protein-protein interaction datasets. To address these challenges, we introduce the Heterogeneous Multi-Head Attention Graph Neural Network model (HMAGNN). Our methodology involves mapping isomeric nodes to common feature space and labeling nodes using an enhanced Weisfeiler–Lehman (WL) algorithm. We then leverage HMAGNN to learn both structural and attribute features individually. Throughout our experimentation, we identify the critical influence of local subgraph structure and size on link prediction outcomes. In response, we introduce virtual nodes during the subgraph extraction process and conduct validation experiments to underscore the significance of these factors. Compared to alternative models, HMAGNN excels in capturing structural features through our labeling approach and dynamically adapts to identify the most pertinent link information using a multi-head attention mechanism. Extensive experiments on benchmark datasets consistently demonstrate that HMAGNN outperforms existing models, establishing it as a state-of-the-art solution for link prediction in the context of jujube sales market analysis.

  • Research Article
  • 10.63503/j.ijssic.2025.45
Optimized Gated Fusion Adaptive Graph Neural Network for Predicting Water Quality in Smart Environments
  • Feb 5, 2025
  • International Journal on Smart & Sustainable Intelligent Computing
  • Shaik Mahaboob Basha + 1 more

Still, an effective water prediction system is essential to support sustainable solu tions for water management in smart environments. Increased accuracy of re al-time estimates can enhance decision-making and thus lead to smaller health hazards and less damage to the environment. Therefore, the Optimized Gated Fusion Adaptive Graph Neural Network (GFAGNN) is presented in this paper for water quality prediction, which adoption of graph structures and neural networks to analyze for hitherto, unseen nonlinear interactions among water parameters. Similarly, this paper outlines the inadequacies of conventional criterion-based extrapolation models and the importance of a flexible graph-based model. The proposed GFAGNN integrates multiple sources into a comprehensive one, and it also applies the gated fusion mechanism to improve the model’s performance when operating in dynamic scenarios. To illustrate the enhancements, a compar ison between GFAGNN and a comparative model – Convolutional Neural Network (CNN) – is included in this paper. From experimental outcomes, it has been ob served that the proposed work GFAGNN outperforms the existing models in terms of accuracy and robustness in calculating water quality indices. The effectiveness of the proposed model is confirmed through multiple computations, simulation, and dataset analysis. Altogether, the results point to the effectiveness of graph neural networks concerning the development of smart water management systems.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 13
  • 10.1007/s13218-022-00781-7
Generating Explanations for Conceptual Validation of Graph Neural Networks: An Investigation of Symbolic Predicates Learned on Relevance-Ranked Sub-Graphs
  • Nov 7, 2022
  • Kunstliche Intelligenz
  • Bettina Finzel + 5 more

Graph Neural Networks (GNN) show good performance in relational data classification. However, their contribution to concept learning and the validation of their output from an application domain’s and user’s perspective have not been thoroughly studied. We argue that combining symbolic learning methods, such as Inductive Logic Programming (ILP), with statistical machine learning methods, especially GNNs, is an essential forward-looking step to perform powerful and validatable relational concept learning. In this contribution, we introduce a benchmark for the conceptual validation of GNN classification outputs. It consists of the symbolic representations of symmetric and non-symmetric figures that are taken from a well-known Kandinsky Pattern data set. We further provide a novel validation framework that can be used to generate comprehensible explanations with ILP on top of the relevance output of GNN explainers and human-expected relevance for concepts learned by GNNs. Our experiments conducted on our benchmark data set demonstrate that it is possible to extract symbolic concepts from the most relevant explanations that are representative of what a GNN has learned. Our findings open up a variety of avenues for future research on validatable explanations for GNNs.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon