Abstract
BackgroundNon-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model.MethodsThe BayesC model assumes a priori that markers have normally distributed effects with probability uppi and no effect with probability (1 − uppi ). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated.ResultsSVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP effects (SNP-BLUP model). When reducing marker density from WGS data to 30 K, SNP-BLUP tended to yield the highest accuracies, at least in the short term.ConclusionsBased on SVD of the genotype matrix, we developed a direct method for the calculation of BayesC estimates of marker effects. Although SVD- and MCMC-based marker effects differed slightly, their prediction accuracies were similar. Assuming that the SVD of the marker genotype matrix is already performed for other reasons (e.g. for SNP-BLUP), computation times for the BayesC predictions were comparable to those of SNP-BLUP.
Highlights
Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when wholegenome sequence (WGS) data are analyzed
Assuming that the Singular value decomposition (SVD) of the marker genotype matrix is already performed for other reasons, computation times for the BayesC predictions were comparable to those of SNP-BLUP
The latter model assumes with some prior probability π that the marker effect comes from a prior distribution and with probability (1 − π) that the marker effect has no effect on the trait
Summary
Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when wholegenome sequence (WGS) data are analyzed. In genomic selection (GS), matrix X contains the marker genotypes and the number of marker effects (k) can greatly exceed the number of phenotypic records, especially in the case of whole-genome sequence (WGS) data. The marker effects can be assumed to have a normal distribution, as in the single nucleotide polymorphism best linear unbiased prediction (SNP-BLUP) model, or they can be assumed to come from a mixture of two distributions with one of them having all its probability density at zero [2]. In sequence data, this makes sense biologically, since the causal variates are expected to be contained in the sequence, among many non-causal variates [6] For these models, straightforward application of PCR does not seem very sensible because all principle components are assumed to be affected by all variates, i.e. PCR does not reduce the number of genotypes involved in the prediction
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.