Abstract

Both linear mixed models (LMMs) and sparse regression models are widely used in genetics applications, including, recently, polygenic modeling in genome-wide association studies. These two approaches make very different assumptions, so are expected to perform well in different situations. However, in practice, for a given dataset one typically does not know which assumptions will be more accurate. Motivated by this, we consider a hybrid of the two, which we refer to as a “Bayesian sparse linear mixed model” (BSLMM) that includes both these models as special cases. We address several key computational and statistical issues that arise when applying BSLMM, including appropriate prior specification for the hyper-parameters and a novel Markov chain Monte Carlo algorithm for posterior inference. We apply BSLMM and compare it with other methods for two polygenic modeling applications: estimating the proportion of variance in phenotypes explained (PVE) by available genotypes, and phenotype (or breeding value) prediction. For PVE estimation, we demonstrate that BSLMM combines the advantages of both standard LMMs and sparse regression modeling. For phenotype prediction it considerably outperforms either of the other two methods, as well as several other large-scale regression methods previously suggested for this problem. Software implementing our method is freely available from http://stephenslab.uchicago.edu/software.html.

Highlights

  • Both linear mixed models (LMMs) and sparse regression models are widely used in genetics applications

  • Through simulations and applications to real data, we show that Bayesian sparse linear mixed model’’ (BSLMM) successfully combines the advantages of both LMMs and sparse regression, is robust to a variety of settings in PVE estimation, and outperforms both models, and several related models, in phenotype prediction

  • A related issue is the interpretation of the random effect u in BSLMM: from the way we have presented the material u is most naturally interpreted as representing a polygenic component, the combined effect of a large number of small effects across all measured markers

Read more

Summary

Introduction

Both linear mixed models (LMMs) and sparse regression models are widely used in genetics applications. LMMs are often used to control for population stratification, individual relatedness, or unmeasured confounding factors when performing association tests in genetic association studies [1,2,3,4,5,6,7,8,9] and gene expression studies [10,11,12]. They have been used in genetic association studies to jointly analyze groups of SNPs [13,14]. The particular polygenic modeling problems that we focus on here are estimating ‘‘chip heritability’’, being the proportion of variance in phenotypes explained (PVE) by available genotypes [19,22,23,24], and predicting phenotypes based on genotypes [25,26,27,28,29]

Objectives
Methods
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call