Abstract

Population structure is known to cause false-positive detection in association studies. We compared the power, precision, and type-I error rates of various association models in analyses of a simulated dataset with structure at the population (admixture from two populations; P) and family (K) levels. We also compared type-I error rates among models in analyses of publicly available human and dog datasets. The models corrected for none, one, or both structure levels. Correction for K was performed with linear mixed models incorporating familial relationships estimated from pedigrees or genetic markers. Linear models that ignored K were also tested. Correction for P was performed using principal component or structured association analysis. In analyses of simulated and real data, linear mixed models that corrected for K were able to control for type-I error, regardless of whether they also corrected for P. In contrast, correction for P alone in linear models was insufficient. The power and precision of linear mixed models with and without correction for P were similar. Furthermore, power, precision, and type-I error rate were comparable in linear mixed models incorporating pedigree and genomic relationships. In summary, in association studies using samples with both P and K, ancestries estimated using principal components or structured assignment were not sufficient to correct type-I errors. In such cases type-I errors may be controlled by use of linear mixed models with relationships derived from either pedigree or from genetic markers.

Highlights

  • The power of an association study depends on the phenotypic variance explained by the causal variant, the extent of linkage disequilibrium (LD) between the causal variant and the markers, and, not least, the size of the study sample

  • Type-I error rates were lower in linear mixed models (LMM), where familial relationship (K) was used to model the covariance structure of the random individual effect than in LMs

  • At a Bonferroni-corrected significance level, a = 0.05 LMs that did not account for K (LM, LMstr, and LMpca) showed significantly higher false-positive results than expected (p,0.01), whereas LMMs that corrected for K (LMMped, LMMstr, LMMpca, and LMMgmat) showed better control of false-positive results

Read more

Summary

Introduction

The power of an association study depends on the phenotypic variance explained by the causal variant, the extent of linkage disequilibrium (LD) between the causal variant and the markers, and, not least, the size of the study sample. Several recent association studies have focused on collecting large samples to obtain higher powers of detection [1,2] but this practice is often associated with population stratification problems. Population stratification refers to the inclusion of individuals from isolated subpopulations in the population of interest. In such a population, individuals from a subpopulation are, on average, more closely related to each other than to other individuals in the population as a whole. It manifests in the form of herds, colonies, and ethnic groups, and as a consequence of geographic isolation and natural or artificial selection [5]. A subtle form of stratification can occur at the family level, especially in livestock when animals are bred in fullsib or half-sib families [6]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call