Understanding the potential bias of variance components estimators when using genomic models

Beatriz C D Cuyabano,Peter Sørensen,A Christian Sørensen

doi:10.1186/s12711-018-0411-0

Beatriz C D Cuyabano, Peter Sørensen + Show 1 more

Open Access

https://doi.org/10.1186/s12711-018-0411-0

Copy DOI

Journal: Genetics, selection, evolution : GSE	Publication Date: Aug 6, 2018
Citations: 7	License type: open-access

Affiliation: Aarhus University

Abstract

BackgroundGenomic models that link phenotypes to dense genotype information are increasingly being used for infering variance parameters in genetics studies. The variance parameters of these models can be inferred using restricted maximum likelihood, which produces consistent, asymptotically normal estimates of variance components under the true model. These properties are not guaranteed to hold when the covariance structure of the data specified by the genomic model differs substantially from the covariance structure specified by the true model, and in this case, the likelihood of the model is said to be misspecified. If the covariance structure specified by the genomic model provides a poor description of that specified by the true model, the likelihood misspecification may lead to incorrect inferences.ResultsThis work provides a theoretical analysis of the genomic models based on splitting the misspecified likelihood equations into components, which isolate those that contribute to incorrect inferences, providing an informative measure, defined as varvec{kappa }, to compare the covariance structure of the data specified by the genomic and the true models. This comparison of the covariance structures allows us to determine whether or not bias in the variance components estimates is expected to occur.ConclusionsThe theory presented can be used to provide an explanation for the success of a number of recently reported approaches that are suggested to remove sources of bias of heritability estimates. Furthermore, however complex is the quantification of this bias, we can determine that, in genomic models that consider a single genomic component to estimate heritability (assuming SNP effects are all i.i.d.), the bias of the estimator tends to be downward, when it exists.

Highlights

Genomic models that link phenotypes to dense genotype information are increasingly being used for infering variance parameters in genetics studies
We define a genomic model as any linear mixed model (LMM) that links a phenotype to multiple genotypes without knowledge of those that are associated with the phenotype
Misspecification of the likelihood is due to the difference between the covariance structures of the data specified by the misspecified and true models (G and GQ ), and our study shows that the bias of restricted maximum likelihood (REML) estimators of variance parameters is linked to the relationship between the eigen-values and eigen-vectors of both models, occurring when κi =

Summary

Introduction

Genomic models that link phenotypes to dense genotype information are increasingly being used for infering variance parameters in genetics studies. The variance parameters of these models can be inferred using restricted maximum likelihood, which produces consistent, asymptotically normal estimates of variance components under the true model. If the covariance structure specified by the genomic model provides a poor description of that specified by the true model, the likelihood misspecification may lead to incorrect inferences. The correct covariance structure (referred to in our work as GQ ) requires knowledge of the QTL Since these are typically unknown, in practice, the genomic model makes use of the available SNP genotypes instead in order to compute a covariance structure (referred to in our work as G ), leading to misspecification of the likelihood. G may provide a poor description of GQ , and the likelihood misspecification may lead to biased estimators of variance parameters

Objectives

Methods

Results

Discussion

Conclusion