Large-scale empirical validation of Bayesian Network structure learning algorithms with noisy data

Anthony C Constantinou,Yang Liu,Kiattikun Chobtham,Zhigao Guo,Neville K Kitson

doi:10.1016/j.ijar.2021.01.001

Abstract

Numerous Bayesian Network (BN) structure learning algorithms have been proposed in the literature over the past few decades. Each publication makes an empirical or theoretical case for the algorithm proposed in that publication and results across studies are often inconsistent in their claims about which algorithm is ‘best’. This is partly because there is no agreed evaluation approach to determine their effectiveness. Moreover, each algorithm is based on a set of assumptions, such as complete data and causal sufficiency, and tend to be evaluated with data that conforms to these assumptions, however unrealistic these assumptions may be in the real world. As a result, it is widely accepted that synthetic performance overestimates real performance, although to what degree this may happen remains unknown. This paper investigates the performance of 15 state-of-the-art, well-established, or recent promising structure learning algorithms. We propose a methodology that applies the algorithms to data that incorporates synthetic noise, in an effort to better understand the performance of structure learning algorithms when applied to real data. Each algorithm is tested over multiple case studies, sample sizes, types of noise, and assessed with multiple evaluation criteria. This work involved learning approximately 10,000 graphs with a total structure learning runtime of seven months. In investigating the impact of data noise, we provide the first large scale empirical comparison of BN structure learning algorithms under different assumptions of data noise. The results suggest that traditional synthetic performance may overestimate real-world performance by anywhere between 10% and more than 50%. They also show that while score-based learning is generally superior to constraint-based learning, a higher fitting score does not necessarily imply a more accurate causal graph. The comparisons extend to other outcomes of interest, such as runtime, reliability, and resilience to noise, assessed over both small and large networks, and with both limited and big data. To facilitate comparisons with future studies, we have made all data, raw results, graphs and BN models freely available online.

Highlights

A Bayesian Network (BN) graph has two different interpretations
We assume that the edges between variables can represent some dependency that is not necessarily causal, the BN is viewed as a dependence graph that can be represented by a Complete Partial Directed Acyclic Graph (CPDAG), where undirected edges indicate relationships that produce identical posterior distributions irrespective of the direction of the edge
Every BN structure learning algorithm is based on a set of assumptions, such as whether the data is complete or not, or the variables are causally sufficient or not, and tend to be evaluated with synthetic data that conforms to these assumptions, unrealistic these assumptions may be in the real world

Summary

Introduction

If we assume that the edges between variables represent causation, the BN is viewed as a unique Directed Acyclic Graph (DAG) with conditional distributions, referred to as a Causal Bayesian Network (CBN). We assume that the edges between variables can represent some dependency that is not necessarily causal, the BN is viewed as a dependence graph that can be represented by a Complete Partial Directed Acyclic Graph (CPDAG), where undirected edges indicate relationships that produce identical posterior distributions irrespective of the direction of the edge. A CPDAG represents a set of Markov equivalent DAGs. One of the reasons CBNs have become popular in real-world applications is because they enable decision makers to reason with causal assumptions under uncertainty, which in turn enable them to simulate the effect of interventions and extend them to counterfactual reasoning [1] [2]. This paper focuses on assessing the various structure learning algorithms in terms of reconstructing the ground truth DAG, rather than in terms of reconstructing a Markov equivalence class that contains the true graph

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Approximate Reasoning	Publication Date: Jan 25, 2021
Citations: 32	License type: cc-by

R Discovery Prime

R Discovery Prime

Large-scale empirical validation of Bayesian Network structure learning algorithms with noisy data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Approximate Reasoning

Lead the way for us

Similar Papers

Finding Causality in Socio-Technical Systems: A Comparison of Bayesian Network Structure Learning Algorithms
William Martin ... Cassandra Telenko
-
William Martin, et. al.William Martin ... Cassandra Telenko
06 Aug 2017
06 Aug 2017

An Efficient Bayesian Network Structure Learning Algorithm in the Presence of Deterministic Relations
...
-
, et. al. ...
14 Oct 2015
14 Oct 2015

A Dynamic Programming Bayesian Network Structure Learning Algorithm Based on Mutual Information
Zhigang Lv ... Hongxi Wang
Journal of Circuits, Systems and Computers | VOL. 31
Zhigang Lv, et. al.Zhigang Lv ... Hongxi Wang
28 Sep 2022
Journal of Circuits, Systems and Computers | VOL. 31

Dynamic Programming BN Structure Learning Algorithm Integrating Double Constraints under Small Sample Condition
Zhigang Lv ... Xiaojing Sun
Entropy | VOL. 24
Zhigang Lv, et. al.Zhigang Lv ... Xiaojing Sun
24 Sep 2022
Entropy | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Large-scale empirical validation of Bayesian Network structure learning algorithms with noisy data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Approximate Reasoning