Unifying graph neural networks causal machine learning and conformal prediction for robust causal inference in rail transport systems.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Modeling the operational dynamics of intricate rail transit systems faces three significant challenges: addressing network-wide dependencies, differentiating correlation from causation, and accurately quantifying prediction uncertainty. Current methodologies generally tackle these issues separately, resulting in models that are either structurally simplistic, causally unclear, or overly confident in their forecasts. This paper presents a comprehensive framework that, for the first time, effectively combines Graph Neural Networks (GNNs), Causal Machine Learning (CML), and Conformal Prediction (CP) to resolve this dilemma. GNNs are utilized to capture the topological dependencies within the rail network, CML is employed to discern the unbiased causal impacts of operational interventions, and CP offers mathematically assured, distribution-free uncertainty intervals. Our empirical assessment using real-world operational data reveals a distinct differentiation in model performance: while GNN-enhanced hybrids excel in aggregate prediction tasks (CV R² ≈ 0.87), the proposed CML-CP framework realizes a transformative, order-of-magnitude decrease in causal effect estimation error (CV MAE: 124,758.04). Thus, the primary contribution of this research is not merely a singular "best" model, but rather a methodological roadmap that facilitates a paradigm shift from reactive data modeling to a proactive, strategic decision-support tool. This framework equips decision-makers to perform reliable what-if scenario analyses, supported by robust causal insights and valid uncertainty guarantees, leading to more resilient and efficient railway operations.

Similar Papers
  • Conference Article
  • Cite Count Icon 12
  • 10.1109/ispass51385.2021.00029
Performance Analysis of Graph Neural Network Frameworks
  • Mar 1, 2021
  • Junwei Wu + 3 more

Graph neural networks (GNNs) are effective models to address learning problems on graphs and have been successfully applied to numerous domains. To improve the productivity of implementing GNNs, various GNN programming frameworks have been developed. Both the effectiveness (accuracy, loss, etc) and the performance (latency, bandwidth, etc) are essential metrics to evaluate the implementation of GNNs. There are many comparative studies related to the effectiveness of different GNN models on domain tasks. However, the performance characteristics of different GNN frameworks are still lacking. In this study, we evaluate the effectiveness and performance of six popular GNN models, GCN, GIN, GAT, GraphSAGE, MoNet, and GatedGCN, across several common benchmarks under two popular GNN frameworks, PyTorch Geometric and Deep Graph Library. We analyze the training time, GPU utilization, and memory usage of different evaluation settings and the performance of models across different hardware configurations under the two frameworks. Our evaluation provides in-depth observations of performance bottlenecks of GNNs and the performance differences between the two popular GNN frameworks. Our work helps GNN researchers understand the performance differences of the popular GNN frameworks, and gives guidelines for developers to find potential performance bugs of frameworks and optimization possibilities of GNNs.

  • PDF Download Icon
  • Research Article
  • 10.58286/29744
Stochastic estimation of wake-induced loads in wind farms using conformalized graph neural networks
  • Jul 1, 2024
  • e-Journal of Nondestructive Testing
  • Gregory Duthé + 2 more

In the evolving landscape of sustainable energy, wind power has become a cornerstone in the transition towards renewable energy sources. With significant advancements in turbine technology and simulation methods, wind farms are now a cost-effective alternative to traditional power plants. Nevertheless, optimizing the performance and lifespan of wind turbines remains challenging, especially when it comes to predicting and managing the cumulative load on turbine structures. Wake effects, which result from the complex aerodynamic interplay between turbines, reduce energy efficiency and increase mechanical stress on turbine components. A precise assessment of wake-induced loads is therefore vital for estimating the Remaining Useful Lifetime (RUL) of turbines. To address these challenges, we introduce a novel approach using Graph Neural Networks (GNNs) in combination with Conformal Predictors to model wind farms and evaluate wake-induced loads while estimating uncertainty. GNNs are particularly adept at capturing the complex interactions in a wind farm due to their ability to model graph-structured data. This enables a more accurate representation of the aerodynamic interactions between turbines. Alongside the GNN framework, we employ Conformal Predictors for uncertainty estimation. Conformal Predictors provide statistically valid prediction sets based on past data and minimal assumptions, allowing us to estimate uncertainty bounds with improved confidence. The fusion of GNNs with Conformal Predictors offers a robust framework for predicting turbine loads, while reliably quantifying the uncertainty associated with these predictions.

  • Research Article
  • 10.1038/s41746-025-01886-7
A scoping review of artificial intelligence applications in clinical trial risk assessment.
  • Jul 30, 2025
  • NPJ digital medicine
  • Douglas Teodoro + 4 more

Artificial intelligence (AI) is increasingly applied to clinical trial risk assessment, aiming to improve safety and efficiency. This scoping review analyzed 142 studies published between 2013 and 2024, focusing on safety (n = 55), efficacy (n = 46), and operational (n = 45) risk prediction. AI techniques, including traditional machine learning, deep learning (e.g., graph neural networks, transformers), and causal machine learning, are used for tasks like adverse drug event prediction, treatment effect estimation, and phase transition prediction. These methods utilize diverse data sources, from molecular structures and clinical trial protocols to patient data and scientific publications. Recently, large language models (LLMs) have seen a surge in applications, featuring in 7 out of 33 studies in 2023. While some models achieve high performance (AUROC up to 96%), challenges remain, including selection bias, limited prospective studies, and data quality issues. Despite these limitations, AI-based risk assessment holds substantial promise for transforming clinical trials, particularly through improved risk-based monitoring frameworks.

  • Research Article
  • Cite Count Icon 49
  • 10.1021/acs.jcim.1c00208
Deep Learning-Based Conformal Prediction of Toxicity.
  • May 27, 2021
  • Journal of Chemical Information and Modeling
  • Jin Zhang + 2 more

Predictive modeling for toxicity can help reduce risks in a range of applications and potentially serve as the basis for regulatory decisions. However, the utility of these predictions can be limited if the associated uncertainty is not adequately quantified. With recent studies showing great promise for deep learning-based models also for toxicity predictions, we investigate the combination of deep learning-based predictors with the conformal prediction framework to generate highly predictive models with well-defined uncertainties. We use a range of deep feedforward neural networks and graph neural networks in a conformal prediction setting and evaluate their performance on data from the Tox21 challenge. We also compare the results from the conformal predictors to those of the underlying machine learning models. The results indicate that highly predictive models can be obtained that result in very efficient conformal predictors even at high confidence levels. Taken together, our results highlight the utility of conformal predictors as a convenient way to deliver toxicity predictions with confidence, adding both statistical guarantees on the model performance as well as better predictions of the minority class compared to the underlying models.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 10
  • 10.1186/s12911-024-02450-1
Prediction of emergency department revisits among child and youth mental health outpatients using deep learning techniques
  • Feb 8, 2024
  • BMC Medical Informatics and Decision Making
  • Simran Saggu + 8 more

BackgroundThe proportion of Canadian youth seeking mental health support from an emergency department (ED) has risen in recent years. As EDs typically address urgent mental health crises, revisiting an ED may represent unmet mental health needs. Accurate ED revisit prediction could aid early intervention and ensure efficient healthcare resource allocation. We examine the potential increased accuracy and performance of graph neural network (GNN) machine learning models compared to recurrent neural network (RNN), and baseline conventional machine learning and regression models for predicting ED revisit in electronic health record (EHR) data.MethodsThis study used EHR data for children and youth aged 4–17 seeking services at McMaster Children’s Hospital’s Child and Youth Mental Health Program outpatient service to develop and evaluate GNN and RNN models to predict whether a child/youth with an ED visit had an ED revisit within 30 days. GNN and RNN models were developed and compared against conventional baseline models. Model performance for GNN, RNN, XGBoost, decision tree and logistic regression models was evaluated using F1 scores.ResultsThe GNN model outperformed the RNN model by an F1-score increase of 0.0511 and the best performing conventional machine learning model by an F1-score increase of 0.0470. Precision, recall, receiver operating characteristic (ROC) curves, and positive and negative predictive values showed that the GNN model performed the best, and the RNN model performed similarly to the XGBoost model. Performance increases were most noticeable for recall and negative predictive value than for precision and positive predictive value.ConclusionsThis study demonstrates the improved accuracy and potential utility of GNN models in predicting ED revisits among children and youth, although model performance may not be sufficient for clinical implementation. Given the improvements in recall and negative predictive value, GNN models should be further explored to develop algorithms that can inform clinical decision-making in ways that facilitate targeted interventions, optimize resource allocation, and improve outcomes for children and youth.

  • Research Article
  • 10.1609/aaai.v39i20.35425
Enhancing Trustworthiness of Graph Neural Networks with Rank-Based Conformal Training
  • Apr 11, 2025
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Ting Wang + 2 more

Graph Neural Networks (GNNs) has been widely used in a variety of fields because of their great potential in representing graph-structured data. However, lacking of rigorous uncertainty estimations limits their application in high-stakes. Conformal Prediction (CP) can produce statistically guaranteed uncertainty estimates by using the classifier's probability estimates to obtain prediction sets, which contains the true class with a user-specified probability. In this paper, we propose a Rank-based CP during training framework to GNNs (RCP-GNN) for reliable uncertainty estimates to enhance the trustworthiness of GNNs in the node classification scenario. By exploiting rank information of the classifier's outcome, prediction sets with desired coverage rate can be efficiently constructed. The strategy of CP during training with differentiable rank-based conformity loss function is further explored to adapt prediction sets according to network topology information. In this way, the composition of prediction sets can be guided by the goal of jointly reducing inefficiency and probability estimation errors. Extensive experiments on several real-world datasets show that our model achieves any pre-defined target marginal coverage while significantly reducing the inefficiency compared with state-of-the-art methods.

  • Conference Article
  • Cite Count Icon 40
  • 10.1109/tpsisa52974.2021.00002
Membership Inference Attack on Graph Neural Networks
  • Dec 1, 2021
  • Iyiola E Olatunji + 2 more

Graph Neural Networks (GNNs), which generalize traditional deep neural networks on graph data, have achieved state-of-the-art performance on several graph analytical tasks. We focus on how trained GNN models could leak information about the member nodes that they were trained on. We introduce two realistic settings for performing a membership inference (MI) attack on GNNs. While choosing the simplest possible attack model that utilizes the posteriors of the trained model (black-box access), we thoroughly analyze the properties of GNNs and the datasets which dictate the differences in their robustness towards MI attack. While in traditional machine learning models, overfitting is considered the main cause of such leakage, we show that in GNNs the additional structural information is the major contributing factor. We support our findings by extensive experiments on four representative GNN models. To prevent MI attacks on GNN, we propose two effective defenses that significantly decreases the attacker's inference by up to 60% without degradation to the target model's performance. Our code is available at https://github.com/iyempissy/rebMIGraph.

  • PDF Download Icon
  • Research Article
  • 10.36001/phmconf.2023.v15i1.3467
Graph neural networks for dynamic modeling of roller bearings
  • Oct 26, 2023
  • Annual Conference of the PHM Society
  • Vinay Sharma + 3 more

Machine learning has paved the way for the real-time monitoring of complex infrastructure and industrial systems. However, purely data-driven methods have not been able to learn the underlying dynamics and generalize them to operating conditions that have not been covered by the training datasets. Therefore, they have not been able to predict the long-term evolution of the system state of physical systems. Physics-informed neural networks (PINNs) have recently shown promising results in predicting the system state evolution over extended periods of time, owing to the loss terms derived from the underlying partial differential equations governing the dynamics of the systems. However, PINNs have limited generalization ability, i.e., a model trained on one type of boundary condition cannot generalize to other conditions. Moreover, the governing equations used for describing the dynamics of physical systems are an approximation of reality, which can lead to differences between the predictions and the actual roll-out of the trajectory. Recently, graph neural networks (GNNs) have been applied to predict the evolution of system dynamics. Due to the encoded inductive bias, they generalize well to systems with varying configurations and boundary conditions. Message-passing GNN comprises two parts that learn the interaction between nodes: an edge network that takes the translational invariant features between two nodes (for e.g., the distance vector) and generates a message, and a node network that takes the aggregated messages from all the neighboring nodes and produces a new node state. This process is repeated several times until the final node state is decoded as a required output. 
 In the presented work, we propose to apply the framework of GNNs for predicting the dynamics of a rolling element bearing. The computational efficiency and generalizability of such a method enable the scalable use of a real-time digital twin to monitor the health state of a rotating machine. To this end, a GNN is used to mimic a dynamic spring-mass-damper model. Bearings consist of different interacting parts like the inner race, outer race, and multiple rolling elements. This interconnected and interacting architecture of a typical bearing is suitable to be modeled as a graph with nodes representing different components.
 We use the dynamic spring-mass-damper model to generate the training data for the GNN, where bearing components such as rolling elements, and inner and outer raceway are modeled as discrete masses. A Hertzian contact model is used to calculate the forces between these components. We evaluate the learning and generalization capabilities of the proposed GNN framework by testing bearing configurations different from the training configurations and comparing the performance to that of the spring-mass model.

  • Research Article
  • 10.1097/01.ee9.0000610760.14045.7c
Spatio-temporal air pollution models for national-scale health analysis
  • Oct 1, 2019
  • Environmental Epidemiology
  • Wang W + 3 more

PDS 65: Exposure assessment: implications for epidemiology, Exhibition Hall (PDS), Ground floor, August 27, 2019, 1:30 PM - 3:00 PM Background: A major challenge in epidemiological studies is producing accurate short-term air pollution predictions at fine spatial and temporal resolution across large geographic areas. The aim of this study is to develop spatio-temporal models to predict daily concentrations on a 25m grid for nitrogen dioxide (NO2), particulate matters including PM2.5 and PM10, and ozone (O3) for Great Britain from 2010–2015. Daily estimates can be averaged over other time periods (e.g. weekly, monthly, pregnancy trimester, annual) for different health analysis needs. Method: We developed generalised additive models (GAM) with penalised splines to describe spatial and temporal variations in daily concentrations of the pollutants. The models included Geographic Information System (GIS)-derived local-scale predictors and daily estimates (on a ~10km grid) from a chemical transport model (CTM). Model validation was performed using five-fold cross-validation. Results: The spatio-temporal model performance of O3 was strong (cross-validation variance (CV R2)=~0.82 and ~0.91 for daily and annual averaged estimates, respectively). The daily PM models also had high predictive accuracy (CV R2=~0.76 for both pollutants), however the models performed differently in annual estimates (CV R2=~0.66 and ~0.79 for PM2.5 and PM10, respectively). The predictive ability of daily NO2 models was relatively low (CV R2=~0.59), but was improved in annual estimates (CV R2=~0.73). For all pollutants, models performed consistently across study years, though the performance varied by site types, with weaker performance at traffic sites (CV R2=~0.56) compared to background sites (CV R2=~0.72). Frequent predictors that were included in the final models were mostly traffic-related. Conclusion: Our models overall performed well in estimating air pollution concentrations at fine spatial resolution in Great Britain. The models are currently being used in producing daily pollution surfaces. Daily values will be extracted from point of interest and used in air pollution exposure health studies (e.g. time-series analysis of daily hospital admissions).

  • Conference Article
  • Cite Count Icon 6
  • 10.1145/3539597.3570480
Learning to Distill Graph Neural Networks
  • Feb 27, 2023
  • Cheng Yang + 8 more

Graph Neural Networks (GNNs) can effectively capture both the topology and attribute information of a graph, and have been extensively studied in many domains. Recently, there is an emerging trend that equips GNNs with knowledge distillation for better efficiency or effectiveness. However, to the best of our knowledge, existing knowledge distillation methods applied on GNNs all employed predefined distillation processes, which are controlled by several hyper-parameters without any supervision from the performance of distilled models. Such isolation between distillation and evaluation would lead to suboptimal results. In this work, we aim to propose a general knowledge distillation framework that can be applied on any pretrained GNN models to further improve their performance. To address the isolation problem, we propose to parameterize and learn distillation processes suitable for distilling GNNs. Specifically, instead of introducing a unified temperature hyper-parameter as most previous work did, we will learn node-specific distillation temperatures towards better performance of distilled models. We first parameterize each node's temperature by a function of its neighborhood's encodings and predictions, and then design a novel iterative learning process for model distilling and temperature learning. We also introduce a scalable variant of our method to accelerate model training. Experimental results on five benchmark datasets show that our proposed framework can be applied on five popular GNN models and consistently improve their prediction accuracies with 3.12% relative enhancement on average. Besides, the scalable variant enables 8 times faster training speed at the cost of 1% prediction accuracy.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 36
  • 10.1016/j.ress.2023.109341
Geometric deep learning for online prediction of cascading failures in power grids
  • Sep 1, 2023
  • Reliability Engineering & System Safety
  • Anna Varbella + 2 more

Past events have revealed that widespread blackouts are mostly a result of cascading failures in the power grid. Understanding the underlining mechanisms of cascading failures can help in developing strategies to minimize the risk of such events. Moreover, a real-time detection of precursors to cascading failures will help operators take measures to prevent their propagation. Currently, the well-established probabilistic and physics-based models of cascading failures offer low computational efficiency, hindering them to be used only as offline tools. In this work, we develop a data-driven methodology for online estimation of the risk of cascading failures. We utilize a physics-based cascading failure model to generate a cascading failure dataset considering different operating conditions and failure scenarios, thus obtaining a sample space covering a large set of power grid states that are labeled as safe or unsafe. We use the synthetic data to train deep learning architectures, namely Feed-forward Neural Networks (FNN) and Graph Neural Networks (GNN). With the development of GNNs, improved performance is achieved with graph-structured data, and GNNs can generalize to graphs of diverse sizes. A comparison between FNN and GNN is made and the GNNs inductive capability is demonstrated via test grids. Furthermore, we apply transfer learning to improve the performance of a pre-trained GNN model on power grids not seen in the training process. The GNN model shows accuracy and balanced accuracy above 96% on selected test datasets not used in the training. Conversely, the FNN shows accuracy above 85% and balanced accuracy above 81% on test datasets unseen during training. Overall, the GNN model is successful in determining, if one or several simultaneous outages result in a critical grid state, under specific grid operating conditions.

  • Preprint Article
  • 10.5194/egusphere-egu25-8155
Deep Learning for Radar Quantitative Precipitation Estimation over Complex Terrain in Southern China
  • Mar 18, 2025
  • Kexin Zhu + 1 more

Developing region-specific radar quantitative precipitation estimation (QPE) products for South China (SC) is crucial due to its unique climate and complex terrain over there. Deep learning (DL) has emerged as a promising avenue for radar QPE, especially graph neural networks (GNNs). Many studies have tested the DL models in radar QPE, but virtually no studies have evaluated the performance of DL models in different precipitation intensities, types, or organizations. Moreover, limited attention has been given to whether DL-based methods can mitigate radar QPE errors caused by orographic influences in complex terrains, such as those in SC.This study investigates the advantages of DL methods for QPE tasks in South China, utilizing nearly three years of hourly gauge data as labels and ground-based radar reflectivity as inputs. Firstly, multi-layer perceptron (MLP), Convolutional Neural Networks (CNNs), and GNNs with similar architectures are constructed and compared to traditional Z-R relationships considering precipitation types. DL methods outperform traditional Z-R relationships and GNNs perform the best. More importantly, this study conducts a systematic evaluation of the proposed GNN. For extreme precipitation (>30 mm/h), GNN achieves the smallest MAE, highlighting its potential for hazardous event estimation. It also demonstrates stable performance for stratiform and organized precipitation, with minimal bias and standard deviation. However, GNN is less effective for isolated precipitation, whereas CNNs are a better choice due to their ability to estimate scattered rainfall accurately. Last but not least, the Z-R relationship shows systematic spatial biases, overestimating precipitation in coastal plains and underestimating it in inland high-altitude regions. DL methods alleviate these terrain-induced biases by incorporating spatial information. Overall, this study highlights the advantages of DL methods across different precipitation scenarios and demonstrates their ability to mitigate systematic biases from complex terrain.

  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.apenergy.2023.122099
Prediction of wind fields in mountains at multiple elevations using deep learning models
  • Oct 20, 2023
  • Applied Energy
  • Huanxiang Gao + 5 more

Prediction of wind fields in mountains at multiple elevations using deep learning models

  • Research Article
  • Cite Count Icon 2
  • 10.1080/00207543.2025.2458121
Causal machine learning for supply chain risk prediction and intervention planning
  • Jan 28, 2025
  • International Journal of Production Research
  • Mateusz Wyrembek + 2 more

The ultimate goal for developing machine learning models in supply chain management is to make optimal interventions. However, most machine learning models identify correlations in data rather than inferring causation, making it difficult to systematically plan for better outcomes. In this article, we propose and evaluate the use of causal machine learning for developing supply chain risk intervention models, and demonstrate its use with a case study in supply chain risk management in the maritime engineering sector. Our findings highlight that causal machine learning enhances decision-making processes by identifying changes that can be achieved under different supply chain interventions, allowing ‘what-if’ scenario planning. We therefore propose different machine learning developmental pathways for predicting risk and planning for interventions to minimise risk and outline key steps for supply chain researchers to explore causal machine learning and harness its capabilities.

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.neunet.2024.106264
Migrate demographic group for fair Graph Neural Networks
  • Mar 23, 2024
  • Neural Networks
  • Yanming Hu + 5 more

Migrate demographic group for fair Graph Neural Networks

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon