Abstract

Quantifying the population stratification in genotype samples has become a standard procedure for data manipulation before conducting genome wide association studies, as well as for tracing patterns of migration in humans and animals, and for inference about extinct founder populations. The most widely used approach capable of providing biologically interpretable results is a likelihood formulation which allows for estimation of founder genome proportions and founder allele frequency conditional on the observed genotypes. However, if founder allele frequencies are known and samples are dominated by admixed genotypes this approach may lead to biased inference. In addition, processing time increases drastically with the number of genetic markers. This article describes a simplified approach for obtaining biologically meaningful measures of population stratification at the genotype level conditional on known founder allele frequencies. It was tested on cattle and human data sets with 4,022 and 150,000 genetic markers, respectively, and proved to be very accurate in situations where founder poplations were correctly specified, or under-, over-, and miss-specified. Moreover, processing time was only marginally affected by an increase in the number of markers.

Highlights

  • The quantification of population stratification in samples of genotypes is of relevance because if not accounted for it may obscure results from genome wide association studies (GWAS) (Marchini et al, 2004; Price et al, 2010)

  • While the mean squared estimation error produced by ADMIXTURE for the 25-crossover data set decreased to 1/3 of that of the 5-crossover data set, it was still five times larger than that produced by constrained genomic regression (CGR)

  • Unlike the maximum likelihood approach of Alexander et al (2009) which searches for the parameter values making the observed allele content most likely, CGR aims at the parameter values minimizing the difference between the expected and observed allele content

Read more

Summary

Introduction

The quantification of population stratification in samples of genotypes is of relevance because if not accounted for it may obscure results from genome wide association studies (GWAS) (Marchini et al, 2004; Price et al, 2010). The first approach, described by Alexander et al (2009) as “model-based ancestry estimation,” estimates genome proportions of sampled genotypes conditional on a predefined number of ancestral populations, and is embedded in software like STRUCTURE (Pritchard et al, 2000; Falush et al, 2003; Raj et al, 2014), FRAPPE (Tang et al, 2005), and ADMIXTURE (Alexander et al, 2009). This approach yields biologically meaningful results at the individual and population levels. Results from this algorithm still need to be appropriately incorporated into a followup GWAS

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call