A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies.

Raphaël Mourad,Philippe Leray,Christine Sinoquet

doi:10.1186/1471-2105-12-16

Abstract

BackgroundDiscovering the genetic basis of common genetic diseases in the human genome represents a public health issue. However, the dimensionality of the genetic data (up to 1 million genetic markers) and its complexity make the statistical analysis a challenging task.ResultsWe present an accurate modeling of dependences between genetic markers, based on a forest of hierarchical latent class models which is a particular class of probabilistic graphical models. This model offers an adapted framework to deal with the fuzzy nature of linkage disequilibrium blocks. In addition, the data dimensionality can be reduced through the latent variables of the model which synthesize the information borne by genetic markers. In order to tackle the learning of both forest structure and probability distributions, a generic algorithm has been proposed. A first implementation of our algorithm has been shown to be tractable on benchmarks describing 105 variables for 2000 individuals.ConclusionsThe forest of hierarchical latent class models offers several advantages for genome-wide association studies: accurate modeling of linkage disequilibrium, flexible data dimensionality reduction and biological meaning borne by latent variables.

Highlights

Discovering the genetic basis of common genetic diseases in the human genome represents a public health issue
Implementation Algorithm CFHLC has been developed in C++, relying on the ProBT library dedicated to Bayesian networks (BNs) http://bayesianprogramming.org
In Subsection Motivation of the FHLC model for genome-wide association studies (GWASs), we argued that the multiple layers of an forest of hierarchical latent class models (FHLCMs) can describe various degrees of linkage disequilibrium (LD) strength

Summary

Introduction

Discovering the genetic basis of common genetic diseases in the human genome represents a public health issue. The dimensionality of the genetic data (up to 1 million genetic markers) and its complexity make the statistical analysis a challenging task. Genetic markers such as SNPs are the key to dissecting the genetic susceptibility of common complex diseases, such as asthma, diabetes, atherosclerosis and some cancers [1]. Decreasing genotyping costs enable the generation of hundreds of thousands of SNPs, spanning the whole human genome, across cohorts of cases and controls. This scaling up to genome-wide association studies (GWASs) makes the analysis of high-dimensional data a hot topic [2]. Since SNP patterns, rather than single SNPs, are likely to be determinant for complex diseases, a high rate of false positives as well as a perceptible statistical power decrease, not to mention intractability, are severe issues to be overcome

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jan 12, 2011
Citations: 76	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Learning Hierarchical Bayesian Networks for Genome-Wide Association Studies
Raphaël Mourad ... Christine Sinoquet
-
Raphaël Mourad, et. al.Raphaël Mourad ... Christine Sinoquet
01 Jan 2009
01 Jan 2009

SNPpattern: A Genetic Tool to Derive Haplotype Blocks and Measure Genomic Diversity in Populations Using SNP Genotypes
Stephen J. ... Haja N.
-
Stephen J., et. al.Stephen J. ... Haja N.
02 Nov 2011
02 Nov 2011

Extent and Distribution of Linkage Disequilibrium in Three Genomic Regions
Gonçalo R Abecasis ... William O.C Cookson
The American Journal of Human Genetics | VOL. 68
Gonçalo R Abecasis, et. al.Gonçalo R Abecasis ... William O.C Cookson
01 Jan 2001
The American Journal of Human Genetics | VOL. 68

Latent Forests to Model Genetical Data for the Purpose of Multilocus Genome-Wide Association Studies. Which Clustering Should Be Chosen?
Duc-Thanh Phan ... Philippe Leray
-
Duc-Thanh Phan, et. al.Duc-Thanh Phan ... Philippe Leray
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics