Learning Genetic Population Structures Using Minimization of Stochastic Complexity

Jukka Corander,Mats Gyllenberg,Timo Koski

doi:10.3390/e12051102

Abstract

Considerable research efforts have been devoted to probabilistic modeling of genetic population structures within the past decade. In particular, a wide spectrum of Bayesian models have been proposed for unlinked molecular marker data from diploid organisms. Here we derive a theoretical framework for learning genetic population structure of a haploid organism from bi-allelic markers for which potential patterns of dependence are a priori unknown and to be explicitly incorporated in the model. Our framework is based on the principle of minimizing stochastic complexity of an unsupervised classification under tree augmented factorization of the predictive data distribution. We discuss a fast implementation of the learning framework using deterministic algorithms.

Highlights

The concept of a structured or subdivided population has received intensive attention both in applied and theoretical population genetics for several decades
Despite the fact that the traditional formulation is still widely utilized in applied population genetics, a complementary approach to determining how subdivided a population is, has grown popular within the most recent decade
The classification itself will be made notationally explicit in a later section, whereas here we seek a probabilistic representation of the possible dependencies among the d allele frequencies of the individual loci in any particular pool

Summary

Introduction

The concept of a structured or subdivided population has received intensive attention both in applied and theoretical population genetics for several decades Such populations can be considered to harbor multiple pools of individuals, each associated with distinct allele frequencies over a set of Entropy 2010, 12 molecular marker loci. Despite the fact that the traditional formulation is still widely utilized in applied population genetics, a complementary approach to determining how subdivided a population is, has grown popular within the most recent decade This approach is fundamentally based on statistical learning of a mixture model for genotype data from multiple marker loci, where the mixture components are representing gene pools that have drifted apart over time. In the current work we derive a generalization of unsupervised classification approach to inferring genetic population structures from biallelic multilocus data, where the loci are no longer assumed to be conditionally independent given a pool. The final sections thereafter discuss deterministic algorithms for learning the optimal population structure under the introduced framework and provide some concluding remarks

Results and Discussion

Discussion

Prior predictive data distributions under Chow expansion

Asymptotic expansion of the stochastic complexity for a Chow expansion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy	Publication Date: May 5, 2010
Citations: 43	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Learning Genetic Population Structures Using Minimization of Stochastic Complexity

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Similar Papers

Mechanisms of population differentiation in seabirds
V L Friesen ... K D Mccoy
Molecular Ecology | VOL. 16
V L Friesen, et. al.V L Friesen ... K D Mccoy
13 Apr 2007
Molecular Ecology | VOL. 16

Enterobacteriaceae Taxonomy Approached by Minimization of Stochastic Complexity
Helge G Gyllenberg ... Jiri Schindler
Quantitative Microbiology | VOL. 1
Helge G Gyllenberg, et. al.Helge G Gyllenberg ... Jiri Schindler
01 Jan 1998
Quantitative Microbiology | VOL. 1

Diverging Genetic Structure of Coexisting Populations of the Black Storm-Petrel and the Least Storm-Petrel in the Gulf of California
Misael D Mancilla-Morales ... María F López
Tropical Conservation Science | VOL. 13
Misael D Mancilla-Morales, et. al.Misael D Mancilla-Morales ... María F López
01 Jan 2020
Tropical Conservation Science | VOL. 13

Comparison of algorithms to infer genetic population structure from unlinked molecular markers.
Andrea Peña-Malavera ... Cecilia Bruno
Statistical Applications in Genetics and Molecular Biology | VOL. 13
Andrea Peña-Malavera, et. al.Andrea Peña-Malavera ... Cecilia Bruno
01 Jan 2014
Statistical Applications in Genetics and Molecular Biology | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning Genetic Population Structures Using Minimization of Stochastic Complexity

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy