Abstract

BackgroundPopulation stratification and cryptic relationships have been the main sources of excessive false-positives and false-negatives in population-based association studies. Many methods have been developed to model these confounding factors and minimize their impact on the results of genome-wide association studies. In most of these methods, a two-stage approach is applied where: (1) methods are used to determine if there is a population structure in the sample dataset and (2) the effects of population structure are corrected either by modeling it or by running a separate analysis within each sub-population. The objective of this study was to evaluate the impact of population structure on the accuracy and power of genome-wide association studies using a Bayesian multiple regression method.MethodsWe conducted a genome-wide association study in a stochastically simulated admixed population. The genome was composed of six chromosomes, each with 1000 markers. Fifteen segregating quantitative trait loci contributed to the genetic variation of a quantitative trait with heritability of 0.30. The impact of genetic relationships and breed composition (BC) on three analysis methods were evaluated: single marker simple regression (SMR), single marker mixed linear model (MLM) and Bayesian multiple-regression analysis (BMR). Each method was fitted with and without BC. Accuracy, power, false-positive rate and the positive predictive value of each method were calculated and used for comparison.ResultsSMR and BMR, both without BC, were ranked as the worst and the best performing approaches, respectively. Our results showed that, while explicit modeling of genetic relationships and BC is essential for models SMR and MLM, BMR can disregard them and yet result in a higher power without compromising its false-positive rate.ConclusionsThis study showed that the Bayesian multiple-regression analysis is robust to population structure and to relationships among study subjects and performs better than a single marker mixed linear model approach.

Highlights

  • Population stratification and cryptic relationships have been the main sources of excessive falsepositives and false-negatives in population-based association studies

  • Thomasen et al [70] studied population structure in a Danish Jersey population composed of subgroups of animals that originated from Danish or United State Jersey populations and showed that a model that explicitly accounts for breed origin, does not improve genomic predictions compared to a model that ignores breed origin. These results suggest that the performance of quantitative trait locus (QTL) mapping in an admixed population, using a Bayesian multiple regression (BMR) approach with high-density markers, may not be hampered by the spurious FP when breed composition (BC) and relatedness have not been explicitly accounted for

  • On the other hand, modeling BC dramatically improved accuracy, False positive rates (FPR) and positive prediction values (PPV) when the SLIDE method was used for finding thresholds (Table 3); the accuracies of models accounting for BC were 30 to 60% higher than those that did not, improved PPV by 300% but resulted in loss of power by more than 50%

Read more

Summary

Introduction

Population stratification and cryptic relationships have been the main sources of excessive falsepositives and false-negatives in population-based association studies. Many methods have been developed to model these confounding factors and minimize their impact on the results of genome-wide association studies. In most of these methods, a two-stage approach is applied where: (1) methods are used to determine if there is a population structure in the sample dataset and (2) the effects of population structure are corrected either by modeling it or by running a separate analysis within each sub-population. Many approaches have been developed to account for PS and relatedness in a population-based GWAS, including genomic control (GC), structured association (SA), principal component analysis (PCA), mixed linear models (MLM) and multiple regression analysis Compared to simple monogenic traits, complex polygenic phenotypes are more vulnerable to elevated FP rates in GWAS, where the magnitude of signals from multiple QTL may be comparable to those resulting from PS [27].

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.