Use Of Whole-genome Sequence Data Research Articles

BackgroundUse of whole-genome sequence data is expected to increase persistency of genomic prediction across generations and breeds but affects model performance and requires increased computing time. In this study, we investigated whether the split-and-merge Bayesian stochastic search variable selection (BSSVS) model could overcome these issues. BSSVS is performed first on subsets of sequence-based variants and then on a merged dataset containing variants selected in the first step.ResultsWe used a dataset that included 4,154,064 variants after editing and de-regressed proofs for 3415 reference and 2138 validation bulls for somatic cell score, protein yield and interval first to last insemination. In the first step, BSSVS was performed on 106 subsets each containing ~39,189 variants. In the second step, 1060 up to 472,492 variants, selected from the first step, were included to estimate the accuracy of genomic prediction. Accuracies were at best equal to those achieved with the commonly used Bovine 50k-SNP chip, although the number of variants within a few well-known quantitative trait loci regions was considerably enriched. When variant selection and the final genomic prediction were performed on the same data, predictions were biased. Predictions computed as the average of the predictions computed for each subset achieved the highest accuracies, i.e. 0.5 to 1.1 % higher than the accuracies obtained with the 50k-SNP chip, and yielded the least biased predictions. Finally, the accuracy of genomic predictions obtained when all sequence-based variants were included was similar or up to 1.4 % lower compared to that based on the average predictions across the subsets. By applying parallelization, the split-and-merge procedure was completed in 5 days, while the standard analysis including all sequence-based variants took more than three months.ConclusionsThe split-and-merge approach splits one large computational task into many much smaller ones, which allows the use of parallel processing and thus efficient genomic prediction based on whole-genome sequence data. The split-and-merge approach did not improve prediction accuracy, probably because we used data on a single breed for which relationships between individuals were high. Nevertheless, the split-and-merge approach may have potential for applications on data from multiple breeds.Electronic supplementary materialThe online version of this article (doi:10.1186/s12711-016-0225-x) contains supplementary material, which is available to authorized users.

Read full abstract

BackgroundIn contrast to currently used single nucleotide polymorphism (SNP) panels, the use of whole-genome sequence data is expected to enable the direct estimation of the effects of causal mutations on a given trait. This could lead to higher reliabilities of genomic predictions compared to those based on SNP genotypes. Also, at each generation of selection, recombination events between a SNP and a mutation can cause decay in reliability of genomic predictions based on markers rather than on the causal variants. Our objective was to investigate the use of imputed whole-genome sequence genotypes versus high-density SNP genotypes on (the persistency of) the reliability of genomic predictions using real cattle data.MethodsHighly accurate phenotypes based on daughter performance and Illumina BovineHD Beadchip genotypes were available for 5503 Holstein Friesian bulls. The BovineHD genotypes (631,428 SNPs) of each bull were used to impute whole-genome sequence genotypes (12,590,056 SNPs) using the Beagle software. Imputation was done using a multi-breed reference panel of 429 sequenced individuals. Genomic estimated breeding values for three traits were predicted using a Bayesian stochastic search variable selection (BSSVS) model and a genome-enabled best linear unbiased prediction model (GBLUP). Reliabilities of predictions were based on 2087 validation bulls, while the other 3416 bulls were used for training.ResultsPrediction reliabilities ranged from 0.37 to 0.52. BSSVS performed better than GBLUP in all cases. Reliabilities of genomic predictions were slightly lower with imputed sequence data than with BovineHD chip data. Also, the reliabilities tended to be lower for both sequence data and BovineHD chip data when relationships between training animals were low. No increase in persistency of prediction reliability using imputed sequence data was observed.ConclusionsCompared to BovineHD genotype data, using imputed sequence data for genomic prediction produced no advantage. To investigate the putative advantage of genomic prediction using (imputed) sequence data, a training set with a larger number of individuals that are distantly related to each other and genomic prediction models that incorporate biological information on the SNPs or that apply stricter SNP pre-selection should be considered.Electronic supplementary materialThe online version of this article (doi:10.1186/s12711-015-0149-x) contains supplementary material, which is available to authorized users.

Read full abstract

Use Of Whole-genome Sequence Data Research Articles

Related Topics

Articles published on Use Of Whole-genome Sequence Data

Efficient genomic prediction based on whole-genome sequence data using split-and-merge Bayesian variable selection.

Within- and across-breed genomic prediction using whole-genome sequence and single nucleotide polymorphism panels.

Genomic selection: A paradigm shift in animal breeding

Genomic prediction using imputed whole-genome sequence data in Holstein Friesian cattle.

Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle.

Data Re-Identification: Prioritize Privacy

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Use Of Whole-genome Sequence Data Research Articles

Related Topics

Articles published on Use Of Whole-genome Sequence Data

Efficient genomic prediction based on whole-genome sequence data using split-and-merge Bayesian variable selection.

Within- and across-breed genomic prediction using whole-genome sequence and single nucleotide polymorphism panels.

Genomic selection: A paradigm shift in animal breeding

Genomic prediction using imputed whole-genome sequence data in Holstein Friesian cattle.

Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle.

Data Re-Identification: Prioritize Privacy