Abstract

BackgroundThe coupling of appropriate sequencing strategies and imputation methods is critical for assembling large whole-genome sequence datasets from livestock populations for research and breeding. In this paper, we describe and validate the coupling of a sequencing strategy with the imputation method hybrid peeling in real animal breeding settings.MethodsWe used data from four pig populations of different size (18,349 to 107,815 individuals) that were widely genotyped at densities between 15,000 and 75,000 markers genome-wide. Around 2% of the individuals in each population were sequenced (most of them at 1× or 2× and 37–92 individuals per population, totalling 284, at 15–30×). We imputed whole-genome sequence data with hybrid peeling. We evaluated the imputation accuracy by removing the sequence data of the 284 individuals with high coverage, using a leave-one-out design. We simulated data that mimicked the sequencing strategy used in the real populations to quantify the factors that affected the individual-wise and variant-wise imputation accuracies using regression trees.ResultsImputation accuracy was high for the majority of individuals in all four populations (median individual-wise dosage correlation: 0.97). Imputation accuracy was lower for individuals in the earliest generations of each population than for the rest, due to the lack of marker array data for themselves and their ancestors. The main factors that determined the individual-wise imputation accuracy were the genotyping status, the availability of marker array data for immediate ancestors, and the degree of connectedness to the rest of the population, but sequencing coverage of the relatives had no effect. The main factors that determined variant-wise imputation accuracy were the minor allele frequency and the number of individuals with sequencing coverage at each variant site. Results were validated with the empirical observations.ConclusionsWe demonstrate that the coupling of an appropriate sequencing strategy and hybrid peeling is a powerful strategy for generating whole-genome sequence data with high accuracy in large pedigreed populations where only a small fraction of individuals (2%) had been sequenced, mostly at low coverage. This is a critical step for the successful implementation of whole-genome sequence data for genomic prediction and fine-mapping of causal variants.

Highlights

  • The coupling of appropriate sequencing strategies and imputation methods is critical for assem‐ bling large whole-genome sequence datasets from livestock populations for research and breeding

  • We found that a combination of an appropriate sequencing strategy and hybrid peeling achieved high imputation accuracies without any intermediate imputation steps being required for the low density (LD) individuals

  • We demonstrate the high accuracy of hybrid peeling for imputing whole-genome sequence data of hundreds of thousands of individuals from real livestock populations in which only a small fraction of the individuals (2%) had been sequenced, mostly at low coverage

Read more

Summary

Introduction

The coupling of appropriate sequencing strategies and imputation methods is critical for assem‐ bling large whole-genome sequence datasets from livestock populations for research and breeding. Due to the implementation of genomic selection in livestock breeding populations, many individuals in breeding nucleus populations have already been genotyped with marker arrays. This genotype data can be used to identify the individuals that share haplotype segments and to select individuals for sequencing that will be more informative from an imputation perspective given a limited budget [13, 14]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call