Abstract

Structural learning of Bayesian networks (BNs) from observational data has gained increasing applied use and attention from various scientific and industrial areas. The mathematical theory of BNs and their optimization is well developed. Although there are several open-source BN learners in the public domain, none of them are able to handle both small and large feature space data and recover network structures with acceptable accuracy. bAIcis® is a novel BN learning and simulation software from BERG. It was developed with the goal of learning BNs from “Big Data” in health care, often exceeding hundreds of thousands features when research is conducted in genomics or multi-omics. This article provides a comprehensive performance evaluation of bAIcis and its comparison with the open-source BN learners. The study investigated synthetic datasets of discrete, continuous, and mixed data in small and large feature space, respectively. The results demonstrated that bAIcis outperformed the publicly available algorithms in structure recovery precision in almost all of the evaluated settings, achieving the true positive rates of 0.9 and precision of 0.8. In addition, bAIcis supports all data types, including continuous, discrete, and mixed variables. It is effectively parallelized on a distributed system and can work with datasets of thousands of features that are infeasible for any of the publicly available tools with a desired level of recovery accuracy.

Highlights

  • Causal inference, the process of finding relationships that describe cause-and-effect events, involves inferring the consequences in a counterfactual reality where an alternative potential cause occurred (Pearl, 2010; Morgan and Winship, 2014)

  • There are several open-source Bayesian networks (BNs) learners in the public domain, none of them are able to handle both small and large feature space data and recover network structures with acceptable accuracy. bAIcisÒ is a novel BN learning and simulation software from BERG. It was developed with the goal of learning BNs from ‘‘Big Data’’ in health care, often exceeding hundreds of thousands features when research is conducted in genomics or multi-omics

  • Regarding true positive rate (TPR), all analyzed BN learners, except Rimbanet, were able to recover a comparable number of true edges, even in the 50-observations networks

Read more

Summary

Introduction

The process of finding relationships that describe cause-and-effect events, involves inferring the consequences in a counterfactual reality where an alternative potential cause occurred (Pearl, 2010; Morgan and Winship, 2014). As Pearl pointed out, causal and statistical inferences have fundamental differences since they focus on causation and association, respectively (Pearl, 2009a). When compared with statistical inference, causation requires one step further to investigate the BERG Health, Framingham, Massachusetts, USA. Identifying causal relationships generally requires three levels of empirical evidence: temporal precedence, empirical association, and nonspurious relationships (Chambliss and Schutt, 2018). One traditional approach for testing causal hypotheses is to conduct a well-designed experiment, where it is possible to control and intervene the condition, monitor the outcome change, and reach the causal conclusion. A clinical trial is a typical example that aims at demonstrating that one drug is the cause of improved outcomes. In certain scientific fields, such as epidemiology and social science, most studies are, by nature, observational rather than experimental (Rothman et al, 2008); in addition, in new domains such as climate research (Von Storch, 1999) and microarray measurements of gene expression (Nelson et al, 2004), where the number of measured variables can be up to tens of thousands, even when experimental interventions are available, performing such a number of experiments is costly, timeconsuming and takes extensive resources

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call