Abstract

The accurate mapping of causal variants in genome-wide association studies requires the consideration of both, confounding factors (for example, population structure) and nonlinear interactions between individual genetic variants. Here, we propose a method termed 'mixed random forest' that simultaneously accounts for population structure and captures nonlinear genetic effects. We test the model in simulation experiments and show that the mixed random forest approach improves detection power compared with established approaches. In an application to data from an outbred mouse population, we find that mixed random forest identifies associations that are more consistent with prior knowledge than competing methods. Further, our approach allows predicting phenotypes from genotypes with greater accuracy than any of the other methods that we tested. Our results show that approaches that simultaneously account for both, confounding due to population structure and epistatic interactions, are important to fully explain the heritable component of complex quantitative traits.

Highlights

  • The accurate mapping of causal variants in genome-wide association studies requires the consideration of both, confounding factors and nonlinear interactions between individual genetic variants

  • We considered BLUP for this prediction task[36], which is equivalent to mixed random forest (RF) and linear mixed models (LMMs) least absolute shrinkage and selection operator (LASSO), when the estimation of direct genetic factors is dropped such that prediction is solely based on the model of the polygenic background

  • Here, we presented an extension to the popular LMMs

Read more

Summary

Introduction

The accurate mapping of causal variants in genome-wide association studies requires the consideration of both, confounding factors (for example, population structure) and nonlinear interactions between individual genetic variants. One approach to address such polygenic trait architectures are multivariate extensions of LMM, either by including multiple fixed effects in the model[16,17] or by aggregating over the effect of multiple loci using additional random effect terms[18,19,20] While such multivariate approaches have been shown to be effective for explaining linear additive and polygenic genetic components, they do not address non-additive epistatic effects, which for some traits have been shown to explain a larger proportion of phenotypic variation than additive effects[11]. It is straightforward to construct LMMs to test for pairwise epistasis, considering all possible combinations of two locus models[21] Such exhaustive approaches, are computationally demanding and do not address genetic models with more than two loci or other types of non-additive interactions between multiple alleles

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call