Abstract

Bayesian networks are probabilistic models that represent complex distributions in a modular way and have become very popular in many fields. There are many methods to build Bayesian networks from a random sample of independent and identically distributed observations. However, many observational studies are designed using some form of clustered sampling that introduces correlations between observations within the same cluster and ignoring this correlation typically inflates the rate of false positive associations. We describe a novel parameterization of Bayesian networks that uses random effects to model the correlation within sample units and can be used for structure and parameter learning from correlated data without inflating the Type I error rate. We compare different learning metrics using simulations and illustrate the method in two real examples: an analysis of genetic and non-genetic factors associated with human longevity from a family-based study, and an example of risk factors for complications of sickle cell anemia from a longitudinal study with repeated measures.

Highlights

  • Learning Bayesian Networks from Independent and Identically Distributed Observations

  • BN is a vector of random variables Y = (Y1, ... , Yv) with a joint probability distribution that factorizes according to the local and global Markov properties represented by the associated directed acyclic graph (DAG)[13,14,15]

  • There are well established approaches to structure learning of BNs6,7,13 that use either exact Bayesian criteria based on the marginal likelihood p(D|M) = ∫ p(D θ, M)p(θ M)dθ, or asymptotic criteria such as AIC = − 2 log(p(D| θ)) + 2p, or BIC = − 2 log(p(D| θ)) + log (n)p where D denotes the sample of size n, M denotes the BN structure, θ is a vector of p model parameters, p(D|θ, M) and p(θ|M) denote the likelihood function and the prior distribution of the parameters, and θ is the maximum likelihood estimate of θ

Read more

Summary

OPEN Learning Bayesian Networks from

Correlated Data received: 02 October 2015 accepted: 08 April 2016 Published: 05 May 2016. We describe a novel parameterization of Bayesian networks that uses random effects to model the correlation within sample units and can be used for structure and parameter learning from correlated data without inflating the Type I error rate. It is well known that ignoring the correlation between observations can impact the false positive rates of regression methods[10], and the same problem is likely to persist with using BNs. As an example, Fig. 1 illustrates the effect of ignoring the correlation between observations when learning the network structure using three common model selection metrics. We extend mixed-effects regression models to BNs and present the results of simulation studies that describe the inflation to the Type I error due to ignoring correlated data and compare different model selection metrics that can be used for learning mixed-effects BNs. We illustrate our proposed approach in two real data examples.

Background
It assumes that the data are from
Simulation Studies
Power Moderate
LRT BICM BICM
Discussion and Conclusions
Findings
Additional Information

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.