Abstract

In gene mapping, it is common to test for association between the phenotype and the genotype at a large number of loci, i.e., the same response variable is used repeatedly to test a large number of non-independent and non-nested hypotheses. In many of these genetic problems, the underlying model is a mixed model consistent of one or very few major genes concurrently with a genetic background effect, usually thought as of polygenic nature and, consequently, modeled through a random effects term with a well-defined covariance structure dependent upon the kinship between individuals. Either because the interest lies only on the major genes or to simplify the analysis, it is habitual to drop the random effects term and use a simple linear regression model, sometimes complemented with testing via resampling as an attempt to minimize the consequences of this practice. Here, it is shown that dropping the random effects term has not only extreme negative effects on the control of the type I error rate, but it is also unlikely to be fixed by resampling because, whenever the mixed model is correct, this practice does not allow to meet some basic requirements of resampling in a gene mapping context. Furthermore, simulations show that the type I error rates when the random term is ignored can be unacceptably high. As an alternative, this paper introduces a new bootstrap procedure to handle the specific case of mapping by using recombinant congenic strains under a linear mixed model. A simulation study showed that the type I error rates of the proposed procedure are very close to the nominal ones, although they tend to be slightly inflated for larger values of the random effects variance. Overall, this paper illustrates the extent of the adverse consequences of ignoring random effects term due to polygenic factors while testing for genetic linkage and warns us of potential modeling issues whenever simple linear regression for a major gene yields multiple significant linkage peaks.

Highlights

  • For more than four decades, linear mixed models have been used in a wide range of applications because of their conceptual simplicity and flexibility to accommodate correlated sources of variation as well as fixed regressors

  • This model is equivalent to model (Equation 4), the recombinant congenic strains (RCS)/quantitative trait locus (QTL) mixed model, with λ replaced by λ

  • This paper proposes a bootstrapping procedure to estimate the pvalues under a mixed model applied to gene mapping when RCS are used

Read more

Summary

Introduction

For more than four decades, linear mixed models have been used in a wide range of applications because of their conceptual simplicity and flexibility to accommodate correlated sources of variation as well as fixed regressors. Either intentionally or unintentionally, the statistical analysis is carried out ignoring the term Zγ in the model This practice, recognized as inefficient, has been thought to be harmless whenever the interest resides solely on a subset of the regression coefficients with the remaining parameters of the model deemed as nuisance. It is well known that ignoring Zγ and using ordinary least squares, results in an estimator of Var βo that is biased and inconsistent as well as non-independent of βo (Dhymes, 1978). This will affect the distribution properties associated with βo under normality or, otherwise, the asymptotic properties of its distribution. An RCS panel is a replicable mapping population for which animals within the same strain are considered to be genetically identical and related to different degrees www.frontiersin.org

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call