Abstract

BackgroundUse of whole-genome sequence data is expected to increase persistency of genomic prediction across generations and breeds but affects model performance and requires increased computing time. In this study, we investigated whether the split-and-merge Bayesian stochastic search variable selection (BSSVS) model could overcome these issues. BSSVS is performed first on subsets of sequence-based variants and then on a merged dataset containing variants selected in the first step.ResultsWe used a dataset that included 4,154,064 variants after editing and de-regressed proofs for 3415 reference and 2138 validation bulls for somatic cell score, protein yield and interval first to last insemination. In the first step, BSSVS was performed on 106 subsets each containing ~39,189 variants. In the second step, 1060 up to 472,492 variants, selected from the first step, were included to estimate the accuracy of genomic prediction. Accuracies were at best equal to those achieved with the commonly used Bovine 50k-SNP chip, although the number of variants within a few well-known quantitative trait loci regions was considerably enriched. When variant selection and the final genomic prediction were performed on the same data, predictions were biased. Predictions computed as the average of the predictions computed for each subset achieved the highest accuracies, i.e. 0.5 to 1.1 % higher than the accuracies obtained with the 50k-SNP chip, and yielded the least biased predictions. Finally, the accuracy of genomic predictions obtained when all sequence-based variants were included was similar or up to 1.4 % lower compared to that based on the average predictions across the subsets. By applying parallelization, the split-and-merge procedure was completed in 5 days, while the standard analysis including all sequence-based variants took more than three months.ConclusionsThe split-and-merge approach splits one large computational task into many much smaller ones, which allows the use of parallel processing and thus efficient genomic prediction based on whole-genome sequence data. The split-and-merge approach did not improve prediction accuracy, probably because we used data on a single breed for which relationships between individuals were high. Nevertheless, the split-and-merge approach may have potential for applications on data from multiple breeds.Electronic supplementary materialThe online version of this article (doi:10.1186/s12711-016-0225-x) contains supplementary material, which is available to authorized users.

Highlights

  • Use of whole-genome sequence data is expected to increase persistency of genomic prediction across generations and breeds but affects model performance and requires increased computing time

  • Plotting the prediction accuracies against the variance of the genomic estimated breeding values (GEBV) revealed that prediction accuracies that were similar to those obtained with the 50kSNP chip were associated with a slightly smaller variance of the GEBV than that with the 50k-single nucleotide polymorphism (SNP) chip (Fig. 2)

  • We showed that the SAM approach, combined with the Bayesian stochastic search variable selection (BSSVS) model, was able to perform genomic prediction efficiently using whole-genome sequence data

Read more

Summary

Introduction

Use of whole-genome sequence data is expected to increase persistency of genomic prediction across generations and breeds but affects model performance and requires increased computing time. An important limitation for genomic prediction using whole-genome sequence data, is the required computation time due to the sharp increase in SNP number, e.g. from 50k to over 10 million. Results from empirical studies with imputed sequence data that included all imputed SNPs together in one genomic prediction model, showed little or no improvement over the use of 50k or higherdensity SNP chips, even when a Bayesian variable selection model was used, which is expected to identify the SNPs associated with the trait of interest [16, 17]. Computation time increases proportionally to the sharp increase in p

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call