Abstract

BackgroundThe advent of low cost next generation sequencing has made it possible to sequence a large number of dairy and beef bulls which can be used as a reference for imputation of whole genome sequence data. The aim of this study was to investigate the accuracy and speed of imputation from a high density SNP marker panel to whole genome sequence level. Data contained 132 Holstein, 42 Jersey, 52 Nordic Red and 16 Brown Swiss bulls with whole genome sequence data; 16 Holstein, 27 Jersey and 29 Nordic Reds had previously been typed with the bovine high density SNP panel and were used for validation. We investigated the effect of enlarging the reference population by combining data across breeds on the accuracy of imputation, and the accuracy and speed of both IMPUTE2 and BEAGLE using either genotype probability reference data or pre-phased reference data. All analyses were done on Bovine autosome 29 using 387,436 bi-allelic variants and 13,612 SNP markers from the bovine HD panel.ResultsA combined breed reference population led to higher imputation accuracies than did a single breed reference. The highest accuracy of imputation for all three test breeds was achieved when using BEAGLE with un-phased reference data (mean genotype correlations of 0.90, 0.89 and 0.87 for Holstein, Jersey and Nordic Red respectively) but IMPUTE2 with un-phased reference data gave similar accuracies for Holsteins and Nordic Red. Pre-phasing the reference data only lead to a minor decrease in the imputation accuracy, but gave a large improvement in computation time. Pre-phasing with BEAGLE was substantially faster than pre-phasing with SHAPEIT2 (2.5 hours vs. 52 hours for 242 individuals), and imputation with pre-phased data was faster in IMPUTE2 than in BEAGLE (5 minutes vs. 50 minutes per individual).ConclusionCombining reference populations across breeds is a good option to increase the size of the reference data and in turn the accuracy of imputation when only few animals are available. Pre-phasing the reference data only slightly decreases the accuracy but gives substantial improvements in speed. Using BEAGLE for pre-phasing and IMPUTE2 for imputation is a fast and accurate strategy.

Highlights

  • The advent of low cost generation sequencing has made it possible to sequence a large number of dairy and beef bulls which can be used as a reference for imputation of whole genome sequence data

  • The mismatches between High Density (HD) and sequence data suggests that the best calls based solely on individual sequence information, does not necessarily represent the actual variants, which might cause a negative bias in the assessment of imputation accuracy

  • High density single nucleotide polymorphism (SNP) data concordance between HD and sequence variants increased to an average of 99.86% with a range from 99.16% to 99.96% when the comparison was done based on best calls from BEAGLE posterior genotype probabilities instead of using the genotype probabilities obtained from the variant calling software, which strengthens this hypothesis

Read more

Summary

Introduction

The advent of low cost generation sequencing has made it possible to sequence a large number of dairy and beef bulls which can be used as a reference for imputation of whole genome sequence data. The aim of this study was to investigate the accuracy and speed of imputation from a high density SNP marker panel to whole genome sequence level. Since obtaining whole genome sequences until recently was very costly, genomic studies have mostly relied on single nucleotide polymorphism (SNP) marker arrays, where only a small fraction of polymorphisms are variants are included in the panel tested for association. The cost is not so low that it is possible to just sequence all the individuals of interest Most of these individuals will be genotyped with SNP panels and a large part of the genomic data will have to be inferred with genotype imputation methods. A large part of the information in the database is dominated by Holstein bulls, but genome sequences from other dairy and beef breeds are available

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call