Abstract

The development of genome-informed methods for identifying quantitative trait loci (QTL) and studying the genetic basis of quantitative variation in natural and experimental populations has been driven by advances in high-throughput genotyping. For many complex traits, the underlying genetic variation is caused by the segregation of one or more ‘large-effect’ loci, in addition to an unknown number of loci with effects below the threshold of statistical detection. The large-effect loci segregating in populations are often necessary but not sufficient for predicting quantitative phenotypes. They are, nevertheless, important enough to warrant deeper study and direct modelling in genomic prediction problems. We explored the accuracy of statistical methods for estimating the fraction of marker-associated genetic variance (p) and heritability () for large-effect loci underlying complex phenotypes. We found that commonly used statistical methods overestimate p and . The source of the upward bias was traced to inequalities between the expected values of variance components in the numerators and denominators of these parameters. Algebraic solutions for bias-correcting estimates of p and were found that only depend on the degrees of freedom and are constant for a given study design. We discovered that average semivariance methods, which have heretofore not been used in complex trait analyses, yielded unbiased estimates of p and , in addition to best linear unbiased predictors of the additive and dominance effects of the underlying loci. The cryptic bias problem described here is unrelated to selection bias, although both cause the overestimation of p and . The solutions we described are predicted to more accurately describe the contributions of large-effect loci to the genetic variation underlying complex traits of medical, biological, and agricultural importance.

Highlights

  • The genetic variation observed in nature is frequently caused by genes with quantitative effects [1,2,3,4,5,6,7]

  • The problem we discovered is unrelated to selection bias, the phenomena where the effects of discovered quantitative trait loci (QTL) are inflated by biased sampling from truncated distributions with small sample sizes [64,65,66,67,68,69], and unrelated to the upward biases known to arise in genome-wide association study (GWAS) [70]

  • Variance components are commonly estimated using REML, as was done in the analyses shown throughout this paper, algebraic analyses of ANOVA expected mean squares (EMSs) identified the source of the bias and yielded explicit algebraic solutions for bias correcting ANOVA and REML estimates of p and HM2

Read more

Summary

Introduction

The genetic variation observed in nature is frequently caused by genes with quantitative effects [1,2,3,4,5,6,7]. The concept of genomic prediction emerged as a counterpart to GWAS, initially for estimating genomic-estimated breeding values (GEBVs) in domesticated plants and animals and later for estimating polygenic risk scores (PRSs) in humans and model organisms [20,21,22,23] These technical advances precipitated a consequential shift in the study of quantitative traits from analyses of phenotypic variation limited and informed by pedigree or family data to genome-wide analyses of genotype-tophenotype associations and genomic prediction informed by genotypic data [6, 7, 13, 16, 20, 24,25,26,27,28,29,30,31]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call