Abstract
Whole-genome regression methods are being increasingly used for the analysis and prediction of complex traits and diseases. In human genetics, these methods are commonly used for inferences about genetic parameters, such as the amount of genetic variance among individuals or the proportion of phenotypic variance that can be explained by regression on molecular markers. This is so even though some of the assumptions commonly adopted for data analysis are at odds with important quantitative genetic concepts. In this article we develop theory that leads to a precise definition of parameters arising in high dimensional genomic regressions; we focus on the so-called genomic heritability: the proportion of variance of a trait that can be explained (in the population) by a linear regression on a set of markers. We propose a definition of this parameter that is framed within the classical quantitative genetics theory and show that the genomic heritability and the trait heritability parameters are equal only when all causal variants are typed. Further, we discuss how the genomic variance and genomic heritability, defined as quantitative genetic parameters, relate to parameters of statistical models commonly used for inferences, and indicate potential inferential problems that are assessed further using simulations. When a large proportion of the markers used in the analysis are in LE with QTL the likelihood function can be misspecified. This can induce a sizable finite-sample bias and, possibly, lack of consistency of likelihood (or Bayesian) estimates. This situation can be encountered if the individuals in the sample are distantly related and linkage disequilibrium spans over short regions. This bias does not negate the use of whole-genome regression models as predictive machines; however, our results indicate that caution is needed when using marker-based regressions for inferences about population parameters such as the genomic heritability.
Highlights
Whole-genome regression (WGR) methods [1] are becoming increasingly used for analysis and prediction of complex traits, quantitative or categorical
Whole-genome regression (WGR) methods are being increasingly used for inferring the proportion of variance that can be explained by a linear regression on a massive number of markers, called ‘genomic heritability.’
The statistical assumptions involved in WGRs are somewhat at odds with important quantitative genetics concepts
Summary
Whole-genome regression (WGR) methods [1] are becoming increasingly used for analysis and prediction of complex traits, quantitative or categorical. These methods were first developed for prediction in plant and animal breeding (e.g., [2,3]). Most of the methodological research in WGR methods was developed in animal breeding with a focus on prediction. Little is known about the inferential properties of estimates derived from WGRs models. It is unclear whether the commonly used likelihood-based (or Bayesian) estimators of variance components or of genomic heritability estimate population parameters consistently [7]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.