Abstract

Bayesian Networks (BN) have been a popular predictive modeling formalism in bioinformatics, but their application in modern genomics has been slowed by an inability to cleanly handle domains with mixed discrete and continuous variables. Existing free BN software packages either discretize continuous variables, which can lead to information loss, or do not include inference routines, which makes prediction with the BN impossible. We present CGBayesNets, a BN package focused around prediction of a clinical phenotype from mixed discrete and continuous variables, which fills these gaps. CGBayesNets implements Bayesian likelihood and inference algorithms for the conditional Gaussian Bayesian network (CGBNs) formalism, one appropriate for predicting an outcome of interest from, e.g., multimodal genomic data. We provide four different network learning algorithms, each making a different tradeoff between computational cost and network likelihood. CGBayesNets provides a full suite of functions for model exploration and verification, including cross validation, bootstrapping, and AUC manipulation. We highlight several results obtained previously with CGBayesNets, including predictive models of wood properties from tree genomics, leukemia subtype classification from mixed genomic data, and robust prediction of intensive care unit mortality outcomes from metabolomic profiles. We also provide detailed example analysis on public metabolomic and gene expression datasets. CGBayesNets is implemented in MATLAB and available as MATLAB source code, under an Open Source license and anonymous download at http://www.cgbayesnets.com.

Highlights

  • A Bayesian network (BN) is a data structure that encodes conditional probability distributions between variables of interest by using a graph composed of nodes and directed edges

  • In a comparison study between BNs inferred from only expression data and BNs inferred from expression together with other types of genomic data, the combination of multiple genomic data types results in increased performance [10]. This agrees with our own experience: in our previous work using earlier versions of the CGBayesNets software we identified expression quantitative trait loci (eQTL) in cancer datasets [11] and predicted leukemia types by integrating single nucleotide polymorphisms (SNPs) and messenger ribonucleic acid expression levels [8]

  • Related Software While there are several software packages for learning and predicting with Baysian networks, none provide the mix of features presented by CGBayesNets; in particular no free implementations provide algorithms for inference in networks of mixed discrete and continuous variables

Read more

Summary

Introduction

A Bayesian network (BN) is a data structure that encodes conditional probability distributions between variables of interest by using a graph composed of nodes and directed edges. There are several software packages for doing BN analysis, but to our knowledge all existing free implementations have one of two problems: 1) they do not allow mixed discrete and continuous data in a fully Bayesian mathematical formalism; or 2) they do not perform inference with the BN, merely performing the network learning step.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.