Abstract

BackgroundMillions of genetic variants have been identified by population-scale sequencing projects, but subsets of these variants are needed for routine genomic predictions or genotyping arrays. Methods for selecting sequence variants were compared using simulated sequence genotypes and real July 2015 data from the 1000 Bull Genomes Project.MethodsCandidate sequence variants for 444 Holstein animals were combined with high-density (HD) imputed genotypes for 26,970 progeny-tested Holstein bulls. Test 1 included single nucleotide polymorphisms (SNPs) for 481,904 candidate sequence variants. Test 2 also included 249,966 insertions-deletions (InDels). After merging sequence variants with 312,614 HD SNPs and editing steps, Tests 1 and 2 included 762,588 and 1,003,453 variants, respectively. Imputation quality from findhap software was assessed with 404 of the sequenced animals in the reference population and 40 randomly chosen animals for validation. Their sequence genotypes were reduced to the subset of genotypes that were in common with HD genotypes and then imputed back to sequence. Predictions were tested for 33 traits using 2015 data of 3983 US validation bulls with daughters that were first phenotyped after August 2011.ResultsThe average percentage of correctly imputed variants across all chromosomes was 97.2 for Test 1 and 97.0 for Test 2. Total time required to prepare, edit, impute, and estimate the effects of sequence variants for 27,235 bulls was about 1 week using less than 33 threads. Many sequence variants had larger estimated effects than nearby HD SNPs, but prediction reliability improved only by 0.6 percentage points in Test 1 when sequence SNPs were added to HD SNPs and by 0.4 percentage points in Test 2 when sequence SNPs and InDels were included. However, selecting the 16,648 candidate SNPs with the largest estimated effects and adding them to the 60,671 SNPs used in routine evaluations improved reliabilities by 2.7 percentage points.ConclusionsReliabilities for genomic predictions improved when selected sequence variants were added; gains were similar for simulated and real data for the same population, and larger than previous gains obtained by adding HD SNPs. With many genotyped animals, many data sources, and millions of variants, computing strategies must efficiently balance costs of imputation, selection, and prediction to obtain subsets of markers that provide the highest accuracy.

Highlights

  • Millions of genetic variants have been identified by population-scale sequencing projects, but subsets of these variants are needed for routine genomic predictions or genotyping arrays

  • VanRaden et al Genet Sel Evol (2017) 49:32 can be targeted to specific traits such as genes related to fertility, thereby slightly improving reliability for daughter pregnancy rate by 0.2 percentage points when 39 single nucleotide polymorphisms (SNPs) were added to the marker set used for genomic prediction [9]

  • The reliability gain of 2.3 percentage points obtained for the 600 k compared with the 60 k SNPs is larger than reported earlier from either simulated (0.9) or actual (0.4) genomic predictions [12]

Read more

Summary

Introduction

Millions of genetic variants have been identified by population-scale sequencing projects, but subsets of these variants are needed for routine genomic predictions or genotyping arrays. Methods for selecting sequence variants were compared using simulated sequence genotypes and real July 2015 data from the 1000 Bull Genomes Project. Imputing, selecting, and predicting effects for millions of variants and many thousands of individuals require efficient computation. Strategies to choose variants for inclusion on genotyping arrays of different densities or in routine predictions were developed and compared using simulated data for Holstein bulls. We first examined simulated data and real sequence genotypes from the 1000 Bull Genomes Project [10]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.