Abstract

Linkage disequilibrium study represents a major issue in statistical genetics as it plays a fundamental role in gene mapping and helps us to learn more about human history. The linkage disequilibrium complex structure makes its exploratory data analysis essential yet challenging. Visualization methods, such as the triangular heat map implemented in Haploview, provide simple and useful tools to help understand complex genetic patterns, but remain insufficient to fully describe them. Probabilistic graphical models have been widely recognized as a powerful formalism allowing a concise and accurate modeling of dependences between variables. In this paper, we propose a method for short-range, long-range and chromosome-wide linkage disequilibrium visualization using forests of hierarchical latent class models. Thanks to its hierarchical nature, our method is shown to provide a compact view of both pairwise and multilocus linkage disequilibrium spatial structures for the geneticist. Besides, a multilocus linkage disequilibrium measure has been designed to evaluate linkage disequilibrium in hierarchy clusters. To learn the proposed model, a new scalable algorithm is presented. It constrains the dependence scope, relying on physical positions, and is able to deal with more than one hundred thousand single nucleotide polymorphisms. The proposed algorithm is fast and does not require phase genotypic data.

Highlights

  • Linkage disequilibrium (LD) refers to non-random associations of alleles at two or more loci, over the human genome [1,2]

  • forests of hierarchical latent class models (FHLCMs) will be described in details. We describe another attractive property of FHLCMs as LD visualization tools

  • Short-Range Linkage Disequilibrium We illustrate the visualization of short-range LD using the wellknown Daly et al dataset [12] available at http://www-genome.wi. mit.edu/humgen/IBD5/index.html

Read more

Summary

Introduction

Linkage disequilibrium (LD) refers to non-random associations of alleles at two or more loci, over the human genome [1,2]. Long-range LD (i.e. LD with distances greater than 100 kb) [3], and LD between different chromosomes [4], are observed. LD plays a fundamental role in gene mapping: the observation of a large number of genetic markers over a chromosomic region ensures a precise localization of (non-observed) causal mutations. Based on this property, genome-wide association studies (GWASs) [5,6] aim to systematically localize causal loci over the genome using hundreds of thousands of single nucleotide polymorphisms (SNPs), an abundant and useful class of genetic markers. Bottlenecks, natural selection and migrations are examples of evolutionary events which can be inferred using coalescent models [7]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.