Abstract
Genome-wide breeding value (GWEBV) estimation methods can be classified based on the prior distribution assumptions of marker effects. Genome-wide BLUP methods assume a normal prior distribution for all markers with a constant variance, and are computationally fast. In Bayesian methods, more flexible prior distributions of SNP effects are applied that allow for very large SNP effects although most are small or even zero, but these prior distributions are often also computationally demanding as they rely on Monte Carlo Markov chain sampling. In this study, we adopted the Pareto principle to weight available marker loci, i.e., we consider that x% of the loci explain (100 - x)% of the total genetic variance. Assuming this principle, it is also possible to define the variances of the prior distribution of the 'big' and 'small' SNP. The relatively few large SNP explain a large proportion of the genetic variance and the majority of the SNP show small effects and explain a minor proportion of the genetic variance. We name this method MixP, where the prior distribution is a mixture of two normal distributions, i.e. one with a big variance and one with a small variance. Simulation results, using a real Norwegian Red cattle pedigree, show that MixP is at least as accurate as the other methods in all studied cases. This method also reduces the hyper-parameters of the prior distribution from 2 (proportion and variance of SNP with big effects) to 1 (proportion of SNP with big effects), assuming the overall genetic variance is known. The mixture of normal distribution prior made it possible to solve the equations iteratively, which greatly reduced computation loads by two orders of magnitude. In the era of marker density reaching million(s) and whole-genome sequence data, MixP provides a computationally feasible Bayesian method of analysis.
Highlights
Genomic selection (GS) is currently being adopted by the dairy cattle breeding industries around the world [1]
The aim of this paper is to present this novel approach using the Pareto principle applied on individual marker loci (MixP), and to compare it with other single SNP based Genome-wide breeding value (GWEBV) prediction methods in a real Norwegian Red cattle (NRF) pedigree, and on a real wheat dataset
Accuracy of GWEBV estimations Accuracy of GWEBV estimations was measured by the correlation coefficient between GWEBV and true genotype values
Summary
Genomic selection (GS) is currently being adopted by the dairy cattle breeding industries around the world [1]. Genome-wide breeding value (GWEBV) prediction plays a pivotal role for this new technology. Its accuracy depends on the statistical methods used, the genome, the population structure, and trait heritability. GWEBV estimation methods are categorized based on the assumptions of their prior distributions of marker effects. [2], assume a normal prior distribution for all marker loci with a constant variance. In Bayesian methods, a more flexible prior distribution of SNP effects can be applied that allows for a few but with very large SNP effects whilst most are small or even zero. Bayesian methods often use Monte Carlo Markov chain (MCMC) algorithms which make them computationally demanding
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.