Abstract
One of the most challenging tasks when adopting Bayesian networks (BNs) is the one of learning their structure from data. This task is complicated by the huge search space of possible solutions and by the fact that the problem isNP-hard. Hence, a full enumeration of all the possible solutions is not always feasible and approximations are often required. However, to the best of our knowledge, a quantitative analysis of the performance and characteristics of the different heuristics to solve this problem has never been done before. For this reason, in this work, we provide a detailed comparison of many different state-of-the-art methods for structural learning on simulated data considering both BNs with discrete and continuous variables and with different rates of noise in the data. In particular, we investigate the performance of different widespread scores and algorithmic approaches proposed for the inference and the statistical pitfalls within them.
Highlights
Bayesian networks (BNs) have been applied to several different fields, ranging from the water resource management [1] to the discovery of gene regulatory networks [2, 3]
To evaluate the obtained results, we considered both false positives (FP) and false negatives (FN)
From the obtained results, it is straightforward to notice that methods including more edges in the inferred networks are more subject to errors in terms of accuracy, which may resemble a bias of this metric that tends to penalize solutions with false positive edges rather than false negatives
Summary
Bayesian networks (BNs) have been applied to several different fields, ranging from the water resource management [1] to the discovery of gene regulatory networks [2, 3]. The task of learning a BN can be divided into two subtasks: (1) structural learning, i.e., identification of the topology of the BN, and (2) parametric learning, i.e., estimation of the numerical parameters (conditional probabilities) for a given network topology. Different methods have been proposed to face this problem, and they can be classified into two categories [4,5,6]: (1) methods based on detecting conditional independences, known as constraint-based methods, and (2) score + search methods, known as score-based approaches. It must be noticed that hybrid methods have been proposed in [7] but, for the sake of clarity, here, we limit our discussion to the two mainstream approaches to tackle the task. The number of conditional independence tests that such methods should perform is exponential and, approximation techniques are required
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.