Abstract

BackgroundAvailability of whole-genome sequence data for a large number of cattle and efficient imputation methodologies open a new opportunity to include rare and low-frequency variants (RLFV) in genomic prediction in dairy cattle. The objective of this study was to examine the impact of including RLFV that are within genes and selected from whole-genome sequence variants, on the reliability of genomic prediction for fertility, health and longevity in dairy cattle.ResultsAll genic RLFV with a minor allele frequency lower than 0.05 were extracted from imputed sequence data and subsets were created using different strategies. These subsets were subsequently combined with Illumina 50 k single nucleotide polymorphism (SNP) data and used for genomic prediction. Reliability of prediction obtained by using 50 k SNP data alone was used as reference value and absolute changes in reliabilities are referred to as changes in percentage points. Adding a component that included either all the genic or a subset of selected RLFV into the model in addition to the 50 k component changed the reliability of predictions by − 2.2 to 1.1%, i.e. hardly no change in reliability of prediction was found, regardless of how the RLFV were selected. In addition to these empirical analyses, a simulation study was performed to evaluate the potential impact of adding RLFV in the model on the reliability of prediction. Three sets of causal RLFV (containing 21,468, 1348 and 235 RLFV) that were randomly selected from different numbers of genes were generated and accounted for 10% additional genetic variance of the estimated variance explained by the 50 k SNPs. When genic RLFV based on mapping results were included in the prediction model, reliabilities improved by up to 4.0% and when the causal RLFV were included they improved by up to 6.8%.ConclusionsUsing selected RLFV from whole-genome sequence data had only a small impact on the empirical reliability of genomic prediction in dairy cattle. Our simulations revealed that for sequence data to bring a benefit, the key is to identify causal RLFV.

Highlights

  • Availability of whole-genome sequence data for a large number of cattle and efficient imputation methodologies open a new opportunity to include rare and low-frequency variants (RLFV) in genomic prediction in dairy cattle

  • For the scenario that showed the largest improvement in reliability of prediction (i.e. 0.7% obtained by adding RLFV with medium-tohigh impact annotations for the health index), we tested whether the improvement of the reliability of prediction was the result of increasing the number of RLFV in the model

  • 0.851 0.939 0.925 0.931 0.902 the variance explained by the RLFV was largest when all genic RLFV were included in the prediction model, followed by the scenarios with RLFV with medium and high impact annotations, while the genic RLFV selected by association mapping always explained the smallest genetic variance

Read more

Summary

Introduction

Availability of whole-genome sequence data for a large number of cattle and efficient imputation methodologies open a new opportunity to include rare and low-frequency variants (RLFV) in genomic prediction in dairy cattle. The SNP chips that are routinely used in genomic prediction in dairy cattle include mostly SNPs with a relatively high minor allele frequency (MAF) that can efficiently tag common variants. Including RLFV in genomic prediction, through (imputed) sequence data, might increase the reliability of genomic prediction This may be especially the case for fitness traits, since alleles with deleterious effects are expected to strongly affect fitness and to be rare due to purging from the population. The question about whether the inclusion of subsets of RLFV from imputed whole-genome sequence that are selected by different strategies, in addition to e.g. 50 k SNP chip data, improves the reliability of genomic prediction in dairy cattle has not been investigated

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call