Abstract

BackgroundThe information provided by dense genome-wide markers using high throughput technology is of considerable potential in human disease studies and livestock breeding programs. Genome-wide association studies relate individual single nucleotide polymorphisms (SNP) from dense SNP panels to individual measurements of complex traits, with the underlying assumption being that any association is caused by linkage disequilibrium (LD) between SNP and quantitative trait loci (QTL) affecting the trait. Often SNP are in genomic regions of no trait variation. Whole genome Bayesian models are an effective way of incorporating this and other important prior information into modelling. However a full Bayesian analysis is often not feasible due to the large computational time involved.ResultsThis article proposes an expectation-maximization (EM) algorithm called emBayesB which allows only a proportion of SNP to be in LD with QTL and incorporates prior information about the distribution of SNP effects. The posterior probability of being in LD with at least one QTL is calculated for each SNP along with estimates of the hyperparameters for the mixture prior. A simulated example of genomic selection from an international workshop is used to demonstrate the features of the EM algorithm. The accuracy of prediction is comparable to a full Bayesian analysis but the EM algorithm is considerably faster. The EM algorithm was accurate in locating QTL which explained more than 1% of the total genetic variation. A computational algorithm for very large SNP panels is described.ConclusionsemBayesB is a fast and accurate EM algorithm for implementing genomic selection and predicting complex traits by mapping QTL in genome-wide dense SNP marker data. Its accuracy is similar to Bayesian methods but it takes only a fraction of the time.

Highlights

  • The information provided by dense genome-wide markers using high throughput technology is of considerable potential in human disease studies and livestock breeding programs

  • The emBayesB correlation of 0.88 between GEBV and true breeding value (TBV) for all 1200 individuals was similar to correlations of 0.84 to 0.87 for Bayesian Markov Chain Monte Carlo (MCMC) methods performed on the same data, but larger than correlations of 0.5 to 0.77 for various BLUP models [17]

  • This paper reports an EM algorithm called emBayesB for genome wide prediction in which there is a joint prediction of breeding value from dense single nucleotide polymorphisms (SNP) marker data

Read more

Summary

Introduction

The information provided by dense genome-wide markers using high throughput technology is of considerable potential in human disease studies and livestock breeding programs. Genome-wide association (GWA) studies are being used more often for risk prediction in humans and trait prediction in livestock Such studies associate individual single nucleotide polymorphisms (SNP) from a dense genome-wide panel with between-individual variation in traits. Instead of testing hundreds of thousands of separate hypotheses of 'is this single SNP associated with the trait' as in GWA, the problem is modified to 'what function of the entire SNP information provides the best predictor of the trait' The outcome of these approaches is that many more loci are used in prediction. The set will include false positive loci it includes many more true positive effects and the overall predictive power is much improved [8] This approach to genome-wide prediction is called genomic selection and is being applied to livestock in practice [9]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call