Abstract

The multiple-SNP analysis has been studied by many researchers, in which the effects of multiple SNPs are simultaneously estimated and tested in a multiple linear regression. The multiple-SNP association analysis usually has higher power and lower false-positive rate for detecting causative SNP(s) than single marker analysis (SMA). Several methods have been proposed to simultaneously estimate and test multiple SNP effects. In this research, a fast method called MEML (Mixed model based Expectation-Maximization Lasso algorithm) was developed for simultaneously estimate of multiple SNP effects. An improved Lasso prior was assigned to SNP effects which were estimated by searching the maximum joint posterior mode. The residual polygenic effect was included in the model to absorb many tiny SNP effects, which is treated as missing data in our EM algorithm. A series of simulation experiments were conducted to validate the proposed method, and the results showed that compared with SMMA, the new method can dramatically decrease the false-positive rate. The new method was also applied to the 50k SNP-panel dataset for genome-wide association study of milk production traits in Chinese Holstein cattle. Totally, 39 significant SNPs and their nearby 25 genes were found. The number of significant SNPs is remarkably fewer than that by SMMA which found 105 significant SNPs. Among 39 significant SNPs, 8 were also found by SMMA and several well-known QTLs or genes were confirmed again; furthermore, we also got some positional candidate gene with potential function of effecting milk production traits. These novel findings in our research should be valuable for further investigation.

Highlights

  • Single marker analysis (SMA) is the most practical way for genome-wide association study (GWAS), in which each SNP is tested at a time along the genome

  • The results showed that single-marker mixed-model analysis (SMMA) had higher power than MEML; the false positive number of SMMA was much higher than MEML

  • BIDE is implemented via Bayesian Markov chain Monte Carlo (MCMC), whereas EMAIL is via EM algorithm

Read more

Summary

Introduction

Single marker analysis (SMA) is the most practical way for genome-wide association study (GWAS), in which each SNP is tested at a time along the genome. The SMA provides a simple and fast way for genome-wide QTL mapping, it neglects the effects of other genes on the genome when only one SNP is tested. VB estimates the posterior expectation by iterative calculation and avoid the timeconsuming Markov chain Monte Carlo (MCMC) algorithm, and it is suitable for large number of SNPs. Wu et al used Lasso penalized logistic regression for genome-wide association study of multiple main-effect and interacting-effect SNPs in case-control design [5]. A Bayesian Lasso technology was used by Li et al for shrinkage estimate of multiple-SNP effects for human body mass index [6]. Before the Lasso estimation, a preconditional procedure is preformed via a supervised principal component analysis to reduce the effect of observational noise on model selection, which could denonise the response variable so that variable selection became more efficient

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call